require(plotly)
require(tidyverse)
require(ggridges)
require(cowplot)
require(RColorBrewer)
require(grid)
require(ggtext)
old <- theme_set(theme_bw(base_size = 16))

This is the second part of the analysis. In the first part (see ../input/PHO5-data/20231019-pool-qc-PHO5.Rmd), I did QC and exported the filtered dataset. Here, I will continue working with that dataset and answer our biological questions.

Goal

  • Analyze the full chimera set flow results for PHO5pr-mCherry reporter.
  • Develop an analysis pipeline to perform QC, correction (if needed) and plotting the results.

Data

Import the background subtracted data

dat0 <- read_tsv("../input/20231023-PHO5-bg-subtracted-data.tsv", col_types = "ccccdddddc")

Filter the data

dat <- filter(dat0, host != "PHO84", flag == "pass", date != "02/10") %>% 
  # based on previous QC, the following sample (both replicates) have high
  # variance - one biological replicate is highly expressed, while the other 
  # two have mNeon, but barely any RFP expression.
  mutate(
    host = fct_recode(host, pho2 = "pho2∆"),
    flag = ifelse(plasmid == "233" & host == "pho2", "high.var", flag))

Number of replicates left for each sample

expt <- dat %>% 
  filter(host %in% c("PHO2", "pho2"), !plasmid %in% c("188", "194")) %>% 
  group_by(date, plasmid, host) %>% 
  summarize(n = n(), .groups = "drop")

expt %>% 
  ggplot(aes(x = plasmid, y = n)) +
  geom_col(aes(fill = host)) + 
  facet_grid(date ~ .) +
  scale_fill_manual(values = c("PHO2" = "gray30", "pho2" = "gray70")) +
  theme_minimal() + background_grid(major = "none") + panel_border(size = 0.5) +
  scale_y_continuous(name = "Replicates", breaks = c(6)) + xlab(NULL) +
  theme(axis.text.x = element_text(angle = 90),
        strip.text.y = element_text(angle = 0),
        legend.position = "top")

Chimera makeup information

meta <- read_tsv("../input/20230208-chimera-Pho4-makeup.txt", col_types = "ccccc")

Summarize data

Here we would like calculate the ratio of RFP/GFP for each chimera (plasmid) across all replicates, including from different days. Note that the parameter of interest is a ratio, which can be estimated using either “means of ratios” or “ratios of means”. These are just two specific instances of a more general estimator, representing two choices of the weights. The “means of ratios” first calculates the ratios for each replicate within a plasmid, then average them. In this calculation, each replicate is given the weight of 1/n (equal). The “ratios of means” first sum up the GFP and RFP values separately across the replicates for each plasmid, then take the ratio between them. In this estimator, the weight for each replicate is x / sum(x), where x is the denominator in the ratio, i.e., GFP. In other words, this estimator will give more weights to the replicates where the chimera had a higher expression level.

Both estimators are known to be biased. We will ignore that for the moment. In terms of a choice between the two, it seems that there is no reason to give more weights to the experiments with a higher GFP signal. So, the “means of ratios” seems a more natural choice. However, we will calcultae both and decide later.

A final question is how to calculate the variance of the ratio estimate. According to the survey package manual, an approximate estimator for the variance is

\[ r = \frac{\bar{y}}{\bar{x}}, \text{where}\ \bar{y}=\frac{1}{n}\sum_{i=1}^{n}y_i\ \text{and}\ \bar{x}=\frac{1}{n}\sum_{i=1}^{n}x_i\ \\ \hat{V}(r) = (1-\frac{n}{N})(\frac{1}{\bar{x}^2})\frac{s_r^2}{n}\ \text{where}\ s_r^2=\frac{1}{n-1}\sum_{i=1}^{n}(y_i-rx_i)^2 \]

Assuming that N>>n, we can ignore the first term in the variance estimator. The rest can be calculated from the data

datsum <- dat %>%
  filter(!is.na(plasmid)) %>% 
  group_by(plasmid, host) %>% 
  summarize(
     n = n(),
    mG = mean(BL1.H),
    mR = mean(YL2.H),
     A = mean(YL2.H/BL1.H),
     r = mR/mG,
    s2 = 1/(n-1)*sum((YL2.H - r*BL1.H)^2),
    vr = 1/(mG^2)*s2/n,
    se = sqrt(vr),
    .groups = "drop"
  ) %>% 
  select(-s2, -r, -vr)# %>% 
  #pivot_wider(names_from = host, values_from = BL1.H:`nR/G`) %>% 
  #mutate(`pho2∆/PHO2` = `R/G_pho2∆`/`R/G_PHO2`,
  #       `n.pho2∆/PHO2` = `nR/G_pho2∆`/`nR/G_PHO2`)

For each chimera, we would also like to calculate three values:

  1. A in pho2∆: this is its base activity without Pho2
  2. A in PHO2: this is its full activity with Pho2
  3. A_PHO2 / A_pho2∆: this is the Pho2 enhancement of activity

We assign the chimeras into several groups, based on their A_PHO2 and A_PHO2/A_pho2∆

ximera <- datsum %>%
  pivot_wider(id_cols = plasmid, names_from = host,
              values_from = c(A, se)) %>% 
  mutate(
    rA_PHO2 = A_PHO2 / A_PHO2[plasmid == "194"],
    rA_pho2 = A_pho2 / A_pho2[plasmid == "194"],
    boost = A_PHO2 / A_pho2,
    group = case_when(
      plasmid %in% c("188", "194") ~ "ref",
      rA_PHO2 < 0.2                ~ "n.f.",
      .default = "chimera"
    ),
    group = fct_relevel(group, "ref", "chimera", "n.f.")
  ) %>% 
  left_join(select(meta, plasmid, set, symbol), by = "plasmid") %>% 
  mutate(symbol = fct_reorder(symbol, rA_PHO2, .desc = TRUE)) %>% 
  relocate(c(set, symbol, group), .after = plasmid)

Export the summarized data

```r
write_tsv(ximera, file = \../output/20231125-PHO5pr-chimera-summarized.tsv\)

<!-- rnb-source-end -->

<!-- rnb-chunk-end -->


<!-- rnb-text-begin -->


To be able to plot all the data points, let's generate another data frame with the individual ratios.


<!-- rnb-text-end -->


<!-- rnb-chunk-begin -->


<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuZGF0X3NlcCA8LSBkYXQgJT4lXG4gIGZpbHRlcighaXMubmEocGxhc21pZCkpICU+JSBcbiAgbXV0YXRlKEEgPSBZTDIuSC9CTDEuSCkgJT4lIFxuICBzZWxlY3QocGxhc21pZCwgaG9zdCwgQkwxLkgsIFlMMi5ILCBBLCBmbGFnKSAlPiUgXG4gIGxlZnRfam9pbihzZWxlY3QobWV0YSwgcGxhc21pZCwgc2V0LCBzeW1ib2wpLCBieSA9IFwicGxhc21pZFwiKSMgJT4lIFxuICAjbXV0YXRlKHN5bWJvbCA9IGZjdF9yZW9yZGVyKHN5bWJvbCwgckFfUEhPMiwgLmRlc2MgPSBUUlVFKSkjICU+JSBcbmBgYCJ9 -->

```r
dat_sep <- dat %>%
  filter(!is.na(plasmid)) %>% 
  mutate(A = YL2.H/BL1.H) %>% 
  select(plasmid, host, BL1.H, YL2.H, A, flag) %>% 
  left_join(select(meta, plasmid, set, symbol), by = "plasmid")# %>% 
  #mutate(symbol = fct_reorder(symbol, rA_PHO2, .desc = TRUE))# %>% 

Analysis

Plotting functions

source("../script/20240211-chimera-data-plotting-functions.R")

—>

Source the scripts

my_plot_ratio <- function(selection){
  # custom colors for this function
  date.colors = c(brewer.pal(name="Dark2", n = 8), brewer.pal(name="Paired", n = 8))
  host.colors = c("PHO2" = "gray30", "pho2" = "gray70")
  point.colors = c("PHO2" = "forestgreen", "pho2" = "purple4")
  # prepare data
  tmp <- my_data_prep(selection)
  # plotting
  p <- tmp %>% 
    select(-c(FSC.H, nGFP, nRFP, flag)) %>% 
    mutate(`R/G` = YL2.H/BL1.H) %>% 
    pivot_longer(cols = c(BL1.H, YL2.H, `R/G`), 
                 names_to = "parameter", values_to = "value") %>% 
    mutate(parameter = factor(parameter, levels = c("R/G", "YL2.H", "BL1.H"),
                              labels = c("RFP/GFP", "PHO5pRFP", "Pho4-GFP"))) %>% 
    ggplot(aes(x = symbol, y = value, group = host)) + 
    stat_summary(aes(group = host), fun.data = "mean_cl_boot", geom = "errorbar",
                 position = position_dodge(0.5), width = 0.3) +
    geom_bar(aes(fill = host), width = 0.5, alpha = 0.8,
             stat = "summary", fun = "mean", position = position_dodge(0.5)) +
    geom_point(data = function(x) subset(x, !symbol %in% c("CCCCC", "SSSSS")),
               aes(group = host, color = date), size = 1, shape = 3, alpha = 0.9,
               position = position_jitterdodge(dodge.width = 0.5, jitter.width = 0.1)) +
    scale_color_manual(values = date.colors, guide = "none") +
    #geom_point(data = function(x) subset(x, !symbol %in% c("CCCCC", "SSSSS")),
    #           aes(group = host, color = host), size = 1, shape = 3, alpha = 0.9,
    #           position = position_jitterdodge(dodge.width = 0.5, jitter.width = 0.1)) +
    #scale_color_manual(values = point.colors) +
    scale_fill_manual(values = host.colors) +
    facet_grid(parameter~group, scales = "free", space = "free_x") +
    theme_bw(base_size = 18) + background_grid(minor = "none") + 
    xlab("Pho4 chimera") +
    theme(axis.text.x = element_text(angle = 30, hjust = 1, family = "mono"),
          legend.position = "top",
          axis.title = element_blank())
  return(p) 
}

Modify the component plotting function for special purposes

host.labels = c("PHO2", "pho2∆")
point.colors = c("PHO2" = "forestgreen", "pho2" = "purple4")
p1 <- dat %>% 
  filter(!is.na(plasmid)) %>% 
  mutate(plasmid = fct_reorder(plasmid, BL1.H, .fun = median) %>% 
           fct_relevel("194", "188")) %>% 
  ggplot(aes(x = plasmid, y = BL1.H)) +
  geom_point(aes(color = host), position = position_jitter(0.1),
             size = 1.1) + 
  scale_color_manual("Host", values = point.colors, labels = host.labels) +
  scale_y_log10(breaks = c(100, 1000, 10000), expand = expansion(mult = 0.1)) +
  scale_x_discrete(expand = expansion(mult = 0.03)) +
  xlab("Pho4 constructs") + ylab("Pho4-mNeon (a.u.)") +
  theme_cowplot() + panel_border(color = "gray30", size = 1.2) +
  theme(axis.text.x = element_text(angle = 90, size = rel(0.6), vjust = 0.5),
        axis.text.y = element_text(size = rel(0.8)),
        axis.title = element_text(size = rel(0.9)),
        axis.line = element_blank(),
        legend.position = c(0.05, 0.9),
        legend.direction = "horizontal",
        legend.text = element_text(face = 3))
Warning: A numeric `legend.position` argument in `theme()` was deprecated in ggplot2 3.5.0.
Please use the `legend.position.inside` argument of `theme()` instead.
p1

#ggsave("../img/20240307-Pho4-chimera-protein-level-variation.png", 
#       width = 6, height = 3)

Pho4 chimera protein level variation

What is the distribution of Pho4 chimera protein levels? Is the mCherry/mNeon ratio a faithful measure of the chimera’s activities?

Distribution of Pho4-mNeon levels grouped by plasmid and host.

tmp <- dat %>% 
  filter(plasmid == "194", host == "PHO2")

lm <- lm(YL2.H ~ BL1.H, data = tmp)
summary(lm)

Call:
lm(formula = YL2.H ~ BL1.H, data = tmp)

Residuals:
    Min      1Q  Median      3Q     Max 
-6194.6 -1651.7  -169.8  1417.1  6612.2 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 9562.8515  1114.3148   8.582 1.48e-14 ***
BL1.H         10.4668     0.7911  13.230  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2284 on 142 degrees of freedom
Multiple R-squared:  0.5521,    Adjusted R-squared:  0.5489 
F-statistic:   175 on 1 and 142 DF,  p-value: < 2.2e-16
p2 <- tmp %>% 
  ggplot(aes(x = BL1.H, y = YL2.H)) +
  geom_point(size = 1.5) + 
  stat_smooth(method = "lm", formula = y ~ x) +
  xlab("Pho4-mNeon") + ylab("PHO5pr-mCherry") +
  theme_cowplot() + 
  panel_border(color = "gray30", size = 1.2) +
  theme(axis.line = element_blank(),
        axis.title = element_text(size = rel(1.2))
  )

p2
ggsave("../img/20240307-ScPho4-mCherry-vs-mNeon-consistent.png", 
       width = 4, height = 3.5)

All chimera (ScPho4 and CgPho4 removed) protein levels by host

host.colors =  c("PHO2" = "gray60", "pho2" = "orange")

tmp <- dat %>% 
  filter(plasmid %in% c("188", "194"), 
         date %in% c("02/08", "02/11", "02/18", "02/21", "02/23", "03/31")) %>% 
  mutate(A = YL2.H/BL1.H,
         Pho4 = factor(plasmid, levels = c("194", "188"), 
                       labels = c("ScPho4", "CgPho4"))) 
p3 <- tmp %>% 
  ggplot(aes(x = date, y = A)) + 
  geom_bar(aes(fill = host), stat = "summary", fun = "mean", 
           position = position_dodge(0.9), alpha = 0.9) +
  geom_point(aes(group = host), size = 0.6, shape = 3,
             position = position_dodge(width = 0.9)) + 
  scale_fill_manual("Host", values = host.colors, labels = host.labels) +
  scale_x_discrete(labels = 1:6) +
  #stat_summary(fun.data = "mean_se", geom = "pointrange", color = "red") +
  facet_grid(Pho4 ~ .) +
  ylab("mCherry/mNeon") + xlab("Replicate") +
  theme_cowplot() + panel_border(color = "gray30", size = 1.2)+
  theme(axis.text = element_text(size = rel(0.7)),
        axis.title = element_text(size = rel(1)),
        axis.line = element_blank(),
        strip.background = element_blank(),
        legend.position = "top",
        legend.title = element_text(size = rel(0.9)),
        legend.text = element_text(size = rel(0.8), face = 3))
p3

#ggsave("../img/20240307-CgPho4-mCherry-vs-mNeon-consistent.png", 
#       width = 4, height = 3.2)

# sample size per day of experiment
tmp %>% count(date, Pho4, host) %>% 
  pivot_wider(names_from = host, values_from = n)
cv <- dat %>% 
  select(-nGFP, -nRFP) %>%
  pivot_longer(FSC.H:YL2.H, names_to = "parameter", values_to = "intensity") %>% 
  group_by(date, plasmid, host, parameter) %>% 
  summarize(
    n = n(),
    mean = mean(intensity),
    cv = sd(intensity)/mean(intensity),
    .groups = "drop"
  ) %>% 
  arrange(desc(cv))

High variance samples

Summarize the background subtracted data by calculating the means and cv for each strain.

control <- filter(dat, plasmid == "194", host == "PHO2") %>% 
  separate(well, into = c("row", "col"), sep = 1) %>% 
  droplevels()

Use the control strain (pH194 with PHO2) to identify and correct for systematic biases

gfp.model.0 <- lm(BL1.H ~ log10(events) + date + row*col, data = control)
step(gfp.model.0)
Start:  AIC=1507.62
BL1.H ~ log10(events) + date + row * col

                Df Sum of Sq     RSS    AIC
- row:col        6     96964 3731743 1499.4
- log10(events)  1     12751 3647531 1506.1
<none>                       3634779 1507.6
- date          11   3415318 7050097 1581.0

Step:  AIC=1499.41
BL1.H ~ log10(events) + date + row + col

                Df Sum of Sq     RSS    AIC
- row            3     92309 3824052 1496.9
- log10(events)  1      9919 3741662 1497.8
<none>                       3731743 1499.4
- col            2   1030496 4762240 1530.5
- date          11   3409784 7141528 1570.9

Step:  AIC=1496.93
BL1.H ~ log10(events) + date + col

                Df Sum of Sq     RSS    AIC
- log10(events)  1     16207 3840259 1495.5
<none>                       3824052 1496.9
- col            2   1024530 4848582 1527.1
- date          11   3423810 7247862 1567.0

Step:  AIC=1495.54
BL1.H ~ date + col

       Df Sum of Sq     RSS    AIC
<none>              3840259 1495.5
- col   2   1068825 4909084 1526.9
- date 11   3422018 7262277 1565.3

Call:
lm(formula = BL1.H ~ date + col, data = control)

Coefficients:
(Intercept)    date02/09    date02/11    date02/16    date02/18    date02/19    date02/20    date02/21    date02/22  
     1827.3       -209.7       -259.9       -351.5       -242.3       -319.0       -499.0       -343.8       -233.1  
  date02/23    date03/30    date03/31         col5         col9  
     -439.5       -533.6       -569.1       -107.4       -211.0  
gfp.model.1 <- lm(BL1.H ~ date + col, data = control)

Model for mNeon

rfp.model.0 <- lm(YL2.H ~ log10(events) + date + row*col, data = control)
step(rfp.model.0)
Start:  AIC=2229.67
YL2.H ~ log10(events) + date + row * col

                Df Sum of Sq        RSS    AIC
- row:col        6  13201086  560400308 2221.1
<none>                        547199222 2229.7
- log10(events)  1  13838563  561037785 2231.3
- date          11 852447575 1399646797 2342.9

Step:  AIC=2221.11
YL2.H ~ log10(events) + date + row + col

                Df Sum of Sq        RSS    AIC
<none>                        560400308 2221.1
- log10(events)  1  12713396  573113704 2222.3
- col            2  65908721  626309029 2233.1
- row            3 154001100  714401408 2250.1
- date          11 851423159 1411823467 2332.2

Call:
lm(formula = YL2.H ~ log10(events) + date + row + col, data = control)

Coefficients:
  (Intercept)  log10(events)      date02/09      date02/11      date02/16      date02/18      date02/19  
      19089.8         3496.2        -6513.6        -4709.4        -6221.7        -3241.9        -6452.8  
    date02/20      date02/21      date02/22      date02/23      date03/30      date03/31           rowC  
      -4558.0        -4776.0        -3995.3        -5054.2        -8507.9       -10061.8        -1882.5  
         rowE           rowG           col5           col9  
      -2429.8        -2641.7         -954.5        -1663.5  
rfp.model.1 <- lm(YL2.H ~ log10(events) + date + row + col, data = control)

Model for PHO5pr::RFP

tmp <- dat %>% 
  # remove one sample with only one valid day of experiment
  filter(!(plasmid == "218" & host == "PHO2"), !plasmid %in% c("188", "194", NA)) %>% 
  nest(data = c(date, BL1.H, YL2.H), .by = c(plasmid, host))

day.var.gfp <- tmp %>% 
  mutate(model = map(data, function(df) lm(BL1.H ~ date, data = df)),
         tidied = map(model, broom::tidy)) %>% 
  unnest(tidied) %>% 
  filter(term != "(Intercept)") %>% 
  mutate(p.adj = p.adjust(p.value, method = "BH")) %>% 
  select(-data, -model) %>% 
  filter(p.adj < 0.10) %>% 
  arrange(plasmid, host)

day.var.rfp <- tmp %>% 
  mutate(model = map(data, function(df) lm(YL2.H ~ date, data = df)),
         tidied = map(model, broom::tidy)) %>% 
  unnest(tidied) %>% 
  filter(term != "(Intercept)") %>% 
  mutate(p.adj = p.adjust(p.value, method = "BH")) %>% 
  select(-data, -model) %>% 
  filter(p.adj < 0.10) %>% 
  arrange(plasmid, host)

there are more systematic shifts in the RFP, significant for row, col, date and also # of events however, I won’t be removing these effects yet, because I’ve found that RFP/GFP ratios are pretty consistent across days. In other words, the variation in GFP and RFP may be cancelled out.

Check for each plasmid how consistent are the measurements between days

# extract ximera names
refs <- c("188","194")
# make a test set
day.var.gfp.list <- unique(day.var.gfp$plasmid)
day.var.rfp.list <- unique(day.var.rfp$plasmid)
p <- my_plot_ratio(c(refs,day.var.gfp.list))# + 
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `host = fct_recode(host, pho2 = "pho2∆")`.
Caused by warning:
! Unknown levels in `f`: pho2∆
p

High day-to-day GFP variance: 212, 222, 229, 231, 241, 251, 252, 277, 301, 326, 328, 329, 331, 334 High day-to-day RFP variance: 212, 216, 239, 241

Plotting components for chimeras with high day-to-day variance in Pho4-mNeon

p <- my_plot_ratio(c(refs,day.var.rfp.list))
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `host = fct_recode(host, pho2 = "pho2∆")`.
Caused by warning:
! Unknown levels in `f`: pho2∆
p

Watch out for CSCscC, SCCsS, SCCsS

Plotting components for chimeras with high day-to-day variance in PHO5pr-mCherry

my_scatter_plot_fix <- function(){
  # this function is the same as the one in the script file, but is used to 
  # plot region 4 effects alone, and doesn't take any input
  s1 = my_data_select(pattern = "XXXCCX")
  s2 = my_data_select(pattern = "XXXSSX")
  scatter.colors = c("ScPho4" = "forestgreen", "CgPho4" = "blue3", 
                     "P2ID:Cg" = "deepskyblue", "P2ID:Sc" = "palegreen2",
                     "P2ID:mixed" = "gray20")
  scatter.size = c("ScPho4" = 3.5, "CgPho4" = 3.5,
                     "P2ID:Cg" = 2.5, "P2ID:Sc" = 2.5, "P2ID:mixed" = 2.5)
  p <- ximera %>% 
    # exclude the alternative break point sets "A" and "B"
    # in particular, pH294 is an alternative break point CCCcs, where P2ID:Cg
    # extends to aa 270 instead of 458. it has nearly the same activities as
    # CgPho4. However, the CCCCS in the main set has significantly reduced
    # activities both with and without Pho4. We later tested whether the 
    # additional P2ID:Cg4 rescues the effect (see Cg4ext below) and it didn't
    # so far, this seems to be an one-off. we need to further investigate its
    # activities.
    filter(set %in% c("M", "S")) %>% 
    mutate(A_PHO2 = signif(A_PHO2, digits = 2),
           A_pho2 = signif(A_pho2, digits = 2),
           group = case_when(
             symbol == "CCCCC" ~ "CgPho4",
             symbol == "SSSSS" ~ "ScPho4",
             plasmid %in% s1 ~ "P2ID:Cg",
             plasmid %in% s2 ~ "P2ID:Sc",
             .default = "P2ID:mixed"
           ),
           group = fct_relevel(group, names(scatter.colors))) %>% 
    ggplot(aes(x = A_PHO2, y = A_pho2, label = symbol)) + 
    geom_abline(slope = 1) +
    geom_point(aes(color = group, size = group)) + 
    scale_color_manual(NULL, values = scatter.colors) +
    scale_size_manual(values = scatter.size, guide = "none") +
    labs(x = bquote(A[PHO2]), y = bquote(A[pho2*Delta])) +
    theme_cowplot() + panel_border(color = "gray30", size = 1.2) +
    theme(legend.text = element_text(size = rel(0.8)),
          legend.position = c(0.03, 0.83),
          axis.title = element_text(face = 2, size = rel(1.2)),
          axis.line = element_blank())
  return(p)
}

most of the day-to-day variance are canceled out after RFP/GFP normalization

All chimera, scatter plot

Plot all chimeras, coloring based on P2ID source

p <- my_scatter_plot_fix()
ggsave(filename = "../img/20240308-all-chimera-scatter-color-by-P2ID.png",
       plot = p, width = 4.5, height = 4, dpi = 300)
ggplotly(p + labs(x = "A<sub>PHO2</sub>", y = "A<sub>pho2</sub>") +
           theme_gray(base_size = 16) +
           theme(legend.text = element_markdown()), 
         tooltip = c("label", "x", "y"))
my_scatter_plot_all <- function(){
  # this function is the same as my_scatter_plot_fix except that it plots all the chimeras
  # without coloring them differently. for figure 5
  s1 = my_data_select(pattern = "XXXCCX")
  s2 = my_data_select(pattern = "XXXSSX")
  scatter.colors = c("ScPho4" = "forestgreen", "CgPho4" = "blue3", 
                     "P2ID:Cg" = "gray20", "P2ID:Sc" = "gray20",
                     "P2ID:mixed" = "gray20")
  scatter.size = c("ScPho4" = 3.5, "CgPho4" = 3.5,
                   "P2ID:Cg" = 2.5, "P2ID:Sc" = 2.5, "P2ID:mixed" = 2.5)
  p <- ximera %>% 
    filter(set %in% c("M", "S")) %>% 
    mutate(A_PHO2 = signif(A_PHO2, digits = 2),
           A_pho2 = signif(A_pho2, digits = 2),
           group = case_when(
             symbol == "CCCCC" ~ "CgPho4",
             symbol == "SSSSS" ~ "ScPho4",
             plasmid %in% s1 ~ "P2ID:Cg",
             plasmid %in% s2 ~ "P2ID:Sc",
             .default = "P2ID:mixed"
           ),
           group = fct_relevel(group, names(scatter.colors))) %>% 
    ggplot(aes(x = A_PHO2, y = A_pho2, label = symbol)) + 
    geom_abline(slope = 1) +
    geom_point(aes(color = group, size = group)) + 
    scale_color_manual(NULL, values = scatter.colors) +
    scale_size_manual(values = scatter.size, guide = "none") +
    labs(x = bquote(A[PHO2]), y = bquote(A[pho2])) +
    theme_cowplot() + panel_border(color = "gray30", size = 1.2) +
    theme(legend.text = element_text(size = rel(0.8)),
          legend.position = "none",
          axis.title = element_text(face = 2, size = rel(1.2)),
          axis.line = element_blank())

  return(p)
}

this function is the same as my_scatter_plot_fix except that it plots all the chimeras without coloring them differently. for figure 5

Plot all chimeras, for Fig. 5

my_plot_subset_ximera <- function(symbols){
  # this function plots a subset of the chimeras as horizontal bar plots
  # showing the Rel. A_PHO2 and %A_pho2∆ values
  # it takes as input a vector containing the symbols for the chimeras for 
  # plotting. the order in the vector determines the plot order
  # the endogenous ScPho4 and CgPho4 are implied
  missing <- setdiff(symbols, ximera$symbol)
  if(length(missing) != 0)
    stop(paste(missing, "are not found", sep = " "))
  
  tmp <- filter(ximera, symbol %in% c("SSSSS", "CCCCC", symbols)) %>% 
    mutate(
      rSE_PHO2 = se_PHO2 / A_PHO2[symbol == "SSSSS"],
      rSE_pho2 = se_pho2 / A_pho2[symbol == "SSSSS"]
    ) %>% 
    pivot_longer(cols = c(rA_PHO2, rA_pho2, rSE_PHO2, rSE_pho2), 
                 #pivot_longer(cols = c(A_PHO2, A_pho2, se_PHO2, se_pho2), 
                 names_to = c(".value", "parameter"), names_sep = "_",
                 values_to = "value") %>% 
    mutate(parameter = fct_relevel(parameter, "PHO2"),
           symbol = factor(symbol, levels = 
                             unique(c("SSSSS", "CCCCC", symbols)))) %>% 
    select(-c(A_PHO2:boost))
  
  # labeller
  par.explain <- c(
    PHO2 = "Rel. A<sub>PHO2</sub>",
    #boost = "Boost",
    pho2 = "Rel. A<sub>pho2∆</sub>"
  )
  
  p <- ggplot(tmp, aes(y = symbol, x = rA)) +
    geom_col(width = 0.5, color = "black", fill = "gray80") +
    geom_vline(xintercept = 1, linetype = 2, color = "gray30") +
    geom_errorbar(aes(xmin = rA - rSE, xmax = rA + rSE), width = 0.2) +
    facet_wrap(~parameter, scales = "free_x",# switch = "x",
              labeller = labeller(parameter = par.explain)) +
    scale_y_discrete(limits = rev) + 
    scale_x_continuous(expand = expansion(mult = c(0.02, 0.05))) +
    theme_cowplot() + panel_border(color = "gray30") +
    background_grid(major = "y", minor = "none") +
    theme(axis.text.y = element_text(family = "courier"),
          axis.title = element_blank(),
          axis.line = element_blank(),
          strip.placement = "outside",
          strip.background = element_blank(),
          strip.text = element_markdown())
  return(p)
}

Spotlight individual chimeras

The goal here is to plot individual chimeras in order to test specific hypotheses and make certain points.

  1. We separately tested and found that CgPho4 DBD binds the consensus DNA more strongly than ScPho4 does, and it also has two additional activation booster regions, which enhance the activity of the main AD. We therefore hypothesize that by replacing the corresponding regions in ScPho4 with the parts from CgPho4, we would create a chimeric TF that is not or far less dependent on Pho2.
  2. We also expect that those regions additively contribute to the reduced Pho2-dependence, shown as increased TF activity of the chimera in the pho2∆ background.

Design plot

my_plot_subset_ximera_alt <- function(symbols){
  # this function plots a subset of the chimeras as horizontal bar plots
  # showing the Rel. A_PHO2 and %A_pho2∆ values
  # it takes as input a vector containing the symbols for the chimeras for 
  # plotting. the order in the vector determines the plot order
  # the endogenous ScPho4 and CgPho4 are implied
  missing <- setdiff(symbols, ximera$symbol)
  if(length(missing) != 0)
    stop(paste(missing, "are not found", sep = " "))
  
  tmp <- filter(dat_sep, symbol %in% c("SSSSS", "CCCCC", symbols)) %>% 
    mutate(host = fct_relevel(host, "PHO2"),
           symbol = factor(symbol, levels = 
                             unique(c("SSSSS", "CCCCC", symbols))))

  tmp %>% count(symbol, host) %>% print()
  # labeller
  par.explain <- c(
    PHO2 = "A<sub>PHO2</sub>",
    #boost = "Boost",
    pho2 = "A<sub>pho2∆</sub>"
  )
  
  p <- ggplot(tmp, aes(y = symbol, x = A)) +
    geom_bar(stat = "summary", fun = "mean", 
             width = 0.5, color = "black", fill = "gray80") +
    stat_summary(fun.data = "mean_cl_boot", geom = "linerange",
                 color = "steelblue4") +
    geom_point(data = filter(tmp, !symbol %in% c("CCCCC", "SSSSS")), 
               size = 0.6, shape = 3, color = "gray30") +
    #geom_vline(xintercept = 1, linetype = 2, color = "gray30") +
    #geom_errorbar(aes(xmin = rA - rSE, xmax = rA + rSE), width = 0.2) +
    facet_wrap(~host, scales = "free_x",# switch = "x",
              labeller = labeller(host = par.explain)) +
    scale_y_discrete(limits = rev) + 
    scale_x_continuous(expand = expansion(mult = c(0.02, 0.05))) +
    theme_cowplot() + panel_border(color = "gray30") +
    background_grid(major = "y", minor = "none") +
    theme(axis.text.y = element_text(family = "courier"),
          axis.title = element_blank(),
          axis.line = element_blank(),
          strip.placement = "outside",
          strip.background = element_blank(),
          strip.text = element_markdown())
  return(list(data = tmp, plot = p))
}

Update 2024-11-22

Alternative design, with individual points and not relative to ScPho4. Also, individual datapoints were plotted for samples with <10 replicates

selected <- as.character(
  expression(CSSSS, SCSSS, CCSSS, SSCSS, CSCSS, SCCSS, CCCSS, SSSSC, SSCSC, CSCSC, CSScC))
#selected <- filter(meta, symbol %in% selected) %>% pull(plasmid)
my_plot_subset_ximera(selected)
ggsave("../img/20240308-selected-chimera-rel-activity.png", width = 4, height = 4)

# plot with absolute A not relative, and plot individual data points
plot.sub1 <- my_plot_subset_ximera_alt(selected)
print(plot.sub1$plot)
ggsave("../img/20241122-selected-chimera-rel-activity.png", width = 4, height = 4)

# save the data for publication
plot.sub1$data %>% 
  select(plasmid_id = plasmid, chimera_makeup = symbol, host, A) %>% 
  write_tsv("../output/20250213-Fig-5C-data.tsv")

Minimal CgPho4 parts for A_pho2

The chimera with the least amount of CgPho4 and yet have appreciable activity in the absence of Pho2 is These include SSSSS, CCCCC, SSSSC, CSSSS, SSCSS, CSCSS, CSSSC, CSCSC

# select the chimeras
selected <- as.character(
  expression(SSSSS, CSSSS, SCSSS, SSCSS, CCSSS, CSCSS, SCCSS, CCCSS)
)

# extract the data
tmp <- ximera %>% 
  filter(symbol %in% selected, set == "M") %>% 
  select(plasmid, symbol, group) %>% 
  inner_join(dat, by = "plasmid") %>% 
  mutate( `R/G` = YL2.H / BL1.H ) %>%
  filter(flag == "pass") %>% 
  select(-nRFP, -nGFP, -well, -flag)

# prepare the factor levels
split <- c(1,1,1,2); names(split) <- c("R1", "AD", "NLS")

tmp <- tmp %>% 
  separate_wider_position(symbol, split) %>% 
  mutate(across(R1:NLS, ~factor(.x, levels = c("S", "C"))))

# test A_PHO2
print("Testing A_PHO2")
[1] "Testing A_PHO2"
lm.res <- tmp %>% 
  filter(host == "PHO2") %>% 
  lm(`R/G` ~ (R1*AD*NLS), data = .) %>% 
  summary()
# adding adjusted P-value
lm.res$coefficients <- cbind(
  coef(lm.res),
  "P.adj" = p.adjust(coef(lm.res)[,'Pr(>|t|)'], method = "holm")
)
print(lm.res)

Call:
lm(formula = `R/G` ~ (R1 * AD * NLS), data = .)

Residuals:
    Min      1Q  Median      3Q     Max 
-6.4865 -1.6014 -0.5069  1.0472 12.7647 

Coefficients:
                Estimate  Std. Error     t value    Pr(>|t|) P.adj
(Intercept)    1.764e+01   2.391e-01   7.375e+01  5.546e-137 0.000
R1C            3.591e+00   1.196e+00   3.004e+00   3.046e-03 0.012
ADC            2.237e+00   1.196e+00   1.871e+00   6.296e-02 0.063
NLSC           1.499e+01   1.196e+00   1.254e+01   2.350e-26 0.000
R1C:ADC        7.578e+00   2.109e+00   3.593e+00   4.213e-04 0.002
R1C:NLSC      -5.110e+00   1.904e+00  -2.684e+00   7.959e-03 0.021
ADC:NLSC       5.586e+00   2.043e+00   2.734e+00   6.873e-03 0.021
R1C:ADC:NLSC  -1.383e+01   3.064e+00  -4.514e+00   1.142e-05 0.000

Residual standard error: 2.869 on 181 degrees of freedom
Multiple R-squared:  0.825, Adjusted R-squared:  0.8182 
F-statistic: 121.9 on 7 and 181 DF,  p-value: < 2.2e-16
# store the test results for plotting
res.PHO2 <- coef(lm.res)[-1,] %>% as_tibble(rownames = "component")

# test A_pho2
print("Testing A_pho2∆")
[1] "Testing A_pho2∆"
lm.res <- tmp %>% 
  filter(host == "pho2") %>% 
  lm(`R/G` ~ (R1*AD*NLS), data = .) %>% 
  summary()
# adding adjusted P-value
lm.res$coefficients <- cbind(
  coef(lm.res),
  "P.adj" = p.adjust(coef(lm.res)[,'Pr(>|t|)'], method = "holm")
)
print(lm.res)

Call:
lm(formula = `R/G` ~ (R1 * AD * NLS), data = .)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.63622 -0.21548 -0.01744  0.10489  1.15930 

Coefficients:
               Estimate Std. Error    t value   Pr(>|t|) P.adj
(Intercept)   1.072e+00  5.581e-02  1.920e+01  1.312e-29  0.00
R1C           6.167e-01  1.477e-01  4.176e+00  8.404e-05  0.00
ADC          -1.153e-02  1.477e-01 -7.805e-02  9.380e-01  1.00
NLSC          1.819e+00  1.477e-01  1.232e+01  3.389e-19  0.00
R1C:ADC       5.923e-01  2.433e-01  2.435e+00  1.746e-02  0.07
R1C:NLSC      1.479e+00  2.433e-01  6.081e+00  5.623e-08  0.00
ADC:NLSC      1.575e-01  2.433e-01  6.475e-01  5.194e-01  1.00
R1C:ADC:NLSC -3.795e-01  3.660e-01 -1.037e+00  3.033e-01  0.91

Residual standard error: 0.3349 on 70 degrees of freedom
Multiple R-squared:  0.9553,    Adjusted R-squared:  0.9509 
F-statistic: 213.9 on 7 and 70 DF,  p-value: < 2.2e-16
# store the test results for plotting
res.pho2 <- coef(lm.res)[-1,] %>% as_tibble(rownames = "component")

# combine the results
test.res <- bind_rows(
  "A_PHO2" = res.PHO2, "A_pho2" = res.pho2, .id = "parameter"
)

# save the output in a text file for paper
write_tsv(test.res, file = "../output/20250213-region-1-3-linear-model-test.txt")

Region 1-3 main effects and interactions

par.explain <- c(
  A_PHO2 = "A<sub>PHO2</sub>",
  A_pho2 = "A<sub>pho2∆</sub>"
)

p <- test.res %>% 
  rename(estimate = Estimate, se = `Std. Error`) %>% 
  mutate(
    component = gsub("C", "", component) %>% fct_inorder(),
    parameter = factor(parameter, levels = c("A_PHO2", "A_pho2")),
    sig = P.adj < 0.05
  ) %>% 
  ggplot(aes(x = component, y = estimate)) +
  geom_hline(yintercept = 0, linetype = 1, color = "gray50") +
  geom_col(aes(fill = P.adj < 0.05), width = 0.5, color = "black") +
  geom_pointrange(aes(ymin = estimate-se, ymax = estimate+se), size = 0.2) +
  facet_wrap(~parameter, scales = "free_y", nrow = 2,
             labeller = labeller(parameter = par.explain)) +
  scale_x_discrete() + 
  scale_y_continuous() +
  scale_fill_manual(NULL, 
                    values = c("gray90", "gray50")) +
  theme_cowplot() + panel_border(color = "gray30") +
  background_grid(major = "y", minor = "y") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, size = rel(1)),
        axis.title = element_blank(),
        axis.line = element_blank(),
        legend.position = "bottom",
        strip.placement = "outside",
        strip.background = element_blank(),
        strip.text = element_markdown(size = rel(1)))
p
ggsave("../img/20240404-region1-3-epistasis-plot.png", width = 3.5, height = 4.5)

Plot the result

selected <- as.character(
  expression(CCCSC, CCCcsC, CCCscC, CSSCC, CSSSC, CSScsC, CSSscC))
#selected <- filter(meta, symbol %in% selected) %>% pull(plasmid)
plot.sub2 <- my_plot_subset_ximera_alt(selected)
plot.sub2$plot + scale_x_continuous(expand = expansion(mult = c(0.02, 0.15)))
Scale for x is already present.
Adding another scale for x, which will replace the existing scale.
ggsave("../img/20241122-P2ID-split-rel-activity.png", width = 3.5, height = 3.5)

plot.sub2$data %>% 
  select(plasmid_id = plasmid, chimera_makeup = symbol, host, A) %>% 
  write_tsv("../output/20250213-Fig-6B-data.tsv")

Region 4 splits

We have so far focused on the main set with the 5 region design. In the scatter plot below, we see that there is a subset of chimeras in between the P2ID:Sc and P2ID:Cg ones. They are interesting in that their A_pho2∆/A_PHO2 ratios are intermediate.

scatter
scatter
# select the chimeras
selected <- as.character(
  expression(CCCCC, CCCSC, CCCcsC, CCCscC)
)

# extract the data
tmp <- ximera %>% 
  filter(symbol %in% selected) %>% 
  select(plasmid, symbol, group) %>% 
  inner_join(dat, by = "plasmid") %>% 
  mutate( `R/G` = YL2.H / BL1.H ) %>%
  filter(flag == "pass") %>% 
  select(-nRFP, -nGFP, -well, -flag) %>% 
  mutate(symbol = factor(symbol, levels = !!selected))

# test A_PHO2
print("Testing A_PHO2")
[1] "Testing A_PHO2"
lm.res <- tmp %>% 
  filter(host == "PHO2") %>% 
  lm(`R/G` ~ symbol, data = .) %>% 
  summary()
# adding adjusted P-value
lm.res$coefficients <- cbind(
  coef(lm.res),
  "P.adj" = p.adjust(coef(lm.res)[,'Pr(>|t|)'], method = "holm")
)
print(lm.res)

Call:
lm(formula = `R/G` ~ symbol, data = .)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.2250 -1.0489 -0.0647  1.2813  6.6290 

Coefficients:
               Estimate Std. Error    t value   Pr(>|t|) P.adj
(Intercept)   1.809e+01  3.445e-01  5.252e+01  2.067e-45     0
symbolCCCSC   4.207e+00  9.115e-01  4.615e+00  2.766e-05     0
symbolCCCcsC -7.086e+00  9.115e-01 -7.774e+00  3.722e-10     0
symbolCCCscC  1.362e+01  9.115e-01  1.494e+01  4.511e-20     0

Residual standard error: 2.067 on 50 degrees of freedom
Multiple R-squared:  0.8711,    Adjusted R-squared:  0.8633 
F-statistic: 112.6 on 3 and 50 DF,  p-value: < 2.2e-16
# store the test results for plotting
#res.PHO2 <- coef(lm.res)[-1,] %>% as_tibble(rownames = "component")

# test A_pho2
print("Testing A_pho2∆")
[1] "Testing A_pho2∆"
lm.res <- tmp %>% 
  filter(host == "pho2") %>% 
  lm(`R/G` ~ symbol, data = .) %>% 
  summary()
# adding adjusted P-value
lm.res$coefficients <- cbind(
  coef(lm.res),
  "P.adj" = p.adjust(coef(lm.res)[,'Pr(>|t|)'], method = "holm")
)
print(lm.res)

Call:
lm(formula = `R/G` ~ symbol, data = .)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.1275 -0.5912 -0.0824  0.5645  5.2165 

Coefficients:
               Estimate Std. Error    t value   Pr(>|t|) P.adj
(Intercept)   1.674e+01  2.869e-01  5.834e+01  1.176e-47 0.000
symbolCCCSC  -1.195e+01  7.591e-01 -1.574e+01  5.226e-21 0.000
symbolCCCcsC -1.255e+01  7.591e-01 -1.654e+01  6.456e-22 0.000
symbolCCCscC  1.928e+00  7.591e-01  2.540e+00  1.424e-02 0.014

Residual standard error: 1.721 on 50 degrees of freedom
Multiple R-squared:  0.9093,    Adjusted R-squared:  0.9038 
F-statistic:   167 on 3 and 50 DF,  p-value: < 2.2e-16

Statistical tests for group 1

# select the chimeras
selected <- as.character(
  expression( CSSCC, CSSSC, CSScsC, CSSscC )
)

# extract the data
tmp <- ximera %>% 
  filter(symbol %in% selected) %>% 
  select(plasmid, symbol, group) %>% 
  inner_join(dat, by = "plasmid") %>% 
  mutate( `R/G` = YL2.H / BL1.H ) %>%
  filter(flag == "pass") %>% 
  select(-nRFP, -nGFP, -well, -flag) %>% 
  mutate(symbol = factor(symbol, levels = !!selected))

# test A_PHO2
print("Testing A_PHO2")
[1] "Testing A_PHO2"
lm.res <- tmp %>% 
  filter(host == "PHO2") %>% 
  lm(`R/G` ~ symbol, data = .) %>% 
  summary()
# adding adjusted P-value
lm.res$coefficients <- cbind(
  coef(lm.res),
  "P.adj" = p.adjust(coef(lm.res)[,'Pr(>|t|)'], method = "holm")
)
print(lm.res)

Call:
lm(formula = `R/G` ~ symbol, data = .)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.4560 -0.5974 -0.1128  0.4629  4.4143 

Coefficients:
              Estimate Std. Error   t value  Pr(>|t|) P.adj
(Intercept)  6.270e+00  7.393e-01 8.481e+00 4.673e-08 0.000
symbolCSSSC  8.962e+00  1.046e+00 8.572e+00 3.949e-08 0.000
symbolCSScsC 2.558e+00  1.046e+00 2.447e+00 2.378e-02 0.024
symbolCSSscC 6.372e+00  1.046e+00 6.094e+00 5.891e-06 0.000

Residual standard error: 1.811 on 20 degrees of freedom
Multiple R-squared:  0.8127,    Adjusted R-squared:  0.7846 
F-statistic: 28.93 on 3 and 20 DF,  p-value: 1.791e-07
# store the test results for plotting
#res.PHO2 <- coef(lm.res)[-1,] %>% as_tibble(rownames = "component")

# test A_pho2
print("Testing A_pho2∆")
[1] "Testing A_pho2∆"
lm.res <- tmp %>% 
  filter(host == "pho2") %>% 
  lm(`R/G` ~ symbol, data = .) %>% 
  summary()
# adding adjusted P-value
lm.res$coefficients <- cbind(
  coef(lm.res),
  "P.adj" = p.adjust(coef(lm.res)[,'Pr(>|t|)'], method = "holm")
)
print(lm.res)

Call:
lm(formula = `R/G` ~ symbol, data = .)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.77040 -0.08856 -0.03391  0.16099  0.37316 

Coefficients:
               Estimate Std. Error    t value   Pr(>|t|) P.adj
(Intercept)   5.244e+00  1.082e-01  4.846e+01  3.257e-22 0.000
symbolCSSSC  -2.822e+00  1.530e-01 -1.844e+01  5.058e-14 0.000
symbolCSScsC -3.061e+00  1.530e-01 -2.000e+01  1.076e-14 0.000
symbolCSSscC -1.243e-01  1.530e-01 -8.125e-01  4.261e-01 0.426

Residual standard error: 0.265 on 20 degrees of freedom
Multiple R-squared:  0.9726,    Adjusted R-squared:  0.9685 
F-statistic: 237.1 on 3 and 20 DF,  p-value: 8.565e-16

Statistical tests for group 2

split <- c(1,1,1,1,1); names(split) <- paste0("P", 1:5)
tmp <- ximera %>% 
  filter(set == "M", group != "n.f.") %>% 
  separate_wider_position(symbol, split) %>% 
  mutate(across(P1:P5, ~factor(.x, levels = c("S", "C"))))
lm <- lm(A_pho2 ~ (P1+P2+P3+P4+P5), data = tmp)
summary(lm)

Call:
lm(formula = A_pho2 ~ (P1 + P2 + P3 + P4 + P5), data = tmp)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.5062 -1.1430 -0.0082  0.8638  6.5285 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -0.7264     1.0811  -0.672   0.5102    
P1C           2.3316     0.8687   2.684   0.0152 *  
P2C           0.7880     0.8687   0.907   0.3763    
P3C           2.5260     0.9010   2.804   0.0117 *  
P4C           4.0647     0.9792   4.151   0.0006 ***
P5C           1.2254     0.9010   1.360   0.1906    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.086 on 18 degrees of freedom
Multiple R-squared:  0.7398,    Adjusted R-squared:  0.6675 
F-statistic: 10.23 on 5 and 18 DF,  p-value: 9.054e-05

Region main effect

```r
my_calc_region_effect <- function(region, variable){
  # this function takes the name of a variable of interest
  # x specifies the foreground region, which will be examined for its effect on
  # the variable of interest.
  # it then transforms the ximera data frame to preserve only the variable of
  # interest, pivots it wider after grouping by the background composition.
  
  # prepare the data by mutating the symbol column into fg and bg
  valid.var <- c(\A_PHO2\, \A_pho2\, \rA_PHO2\, \rA_pho2\, \boost\)
  if(!variable %in% valid.var)
    stop(paste0(\Please specify one of the valid variable names:\, 
                paste(valid.var, collapse = \

The main effects were calculated by averaging over all chimeras with CgPho4 region at the respective position. I’d like to break them down by backgrounds. For example, for region 3, I’d like to see the pairwise comparisons between CCCSS and CCSSS, where only region 3 differs. The steps are

  1. select the region to be compared. split the symbol into two parts - the genotype of the focal region and the rest
  2. group by the second part (rest) and calculate the differential
```r
my_plot_region_effect_twovar_line_par <- function(regions){
  # this function uses my_comp_region_effect to generate the data
  # and plot the difference in A_PHO2 and A_pho2 between the CgPho4 vs ScPho4
  # in the focal region
  dat <- map_dfr(regions, \(region) my_comp_region_effect(region), .id = \region\) %>% 
    pivot_longer(cols = c(dA_PHO2, dA_pho2), 
                 names_to = \host\, values_to = \diff\) %>% 
    mutate(host = fct_recode(host, `PHO2` = \dA_PHO2\, `pho2∆` = \dA_pho2\),
           host = fct_relevel(host, \PHO2\))
  # specify grouping variable
  dat <- mutate(dat, 
                grp = str_sub(bg, 4, 4) %>% toupper(),
                grp = fct_recode(grp, CgPho4 = \C\, ScPho4 = \S\))#,
                #sh = str_sub(bg, 5, 5) %>% toupper(),
                #sh = fct_recode(sh, CgPho4 = \C\, ScPho4 = \S\) )
  # specify arrow annotation
  arrow.x = 0.7
  arrow.y = (max(dat$diff) - min(dat$diff)) / 5 
  # plot
  p <- dat %>% 
    ggplot(aes(x = host, y = diff, label = bg)) +
    geom_point(aes(color = grp), size = 2, alpha = 0.8,
               position = position_jitter(0.1)) + 
    geom_line(aes(group = bg), linewidth = 0.2, alpha = 0.8) +
    facet_grid(region ~ grp, labeller = labeller(
      grp = c(CgPho4 = \P2ID:Cg\, ScPho4 = \P2ID:Sc\),
      region = label_both
    )) +
    scale_color_manual(\P2ID:\, values = c(\orange\, \gray30\), guide = \none\) +
    #scale_shape_manual(\DBD:\, values = c(19, 1)) +
    ylab(\Region swap effect (Cg-Sc)\) +
    theme_bw(base_size = 18) + 
    theme(
      axis.title.x = element_blank(),
      axis.title.y = element_text(size = rel(0.9)),
      axis.text.x = element_text(face = 3),
      axis.text.y = element_text(size = rel(0.8)),
      legend.text = element_text(size = rel(0.8)),
      legend.title = element_text(size = rel(0.9)),
      legend.position = \top\,
      strip.background = element_blank()
    )
  return(p)
}

<!-- rnb-source-end -->

<!-- rnb-chunk-end -->


<!-- rnb-text-begin -->

x = 5 p1 <- my_plot_region_effect_onevar(x, “A_PHO2”) p2 <- my_plot_region_effect_onevar(x, “A_pho2”) subplot(p1, p2, margin = 0.05) %>% layout(title = paste(“Region”, x, “swap effect on A_PHO2 and A_pho2”, sep = ” “), xaxis = list(title = paste0(”Region “, x,” from CgPho4”)), yaxis = list(title = paste0(“Region”, x, ” from ScPho4”)) )


Here, I'd like to take what I build above and create a new tibble, in which each row is a different background (makeup of the chimera except for the focal region). The value columns are:

1.  dA_PHO2 = A_PHO2_Cg - A_PHO2_Sc
2.  dA_pho2 = A_pho2_Cg - A_pho2_Sc
3.  A_PHO2_Sc = A_PHO2_Sc

The goal is to plot dA_PHO2 and dA_pho2 side-by-side for each background.


<!-- rnb-text-end -->


<!-- rnb-chunk-begin -->


<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxubXlfcGxvdF9yZWdpb25fZWZmZWN0X3R3b3Zhcl9saW5lX3BhcihjKDEsMykpXG5nZ3NhdmUoXFwuLi9pbWcvMjAyNDAzMTAtcmVnaW9uLXN3YXAtZWZmZWN0LTFuMy1vbi00LnBuZ1xcLFxuICAgICAgIHdpZHRoID0gNSwgaGVpZ2h0ID0gMy41KVxuYGBgXG5gYGAifQ== -->

```r
```r
my_plot_region_effect_twovar_line_par(c(1,3))
ggsave(\../img/20240310-region-swap-effect-1n3-on-4.png\,
       width = 5, height = 3.5)

<!-- rnb-source-end -->

<!-- rnb-chunk-end -->


<!-- rnb-text-begin -->



<!-- rnb-text-end -->


<!-- rnb-chunk-begin -->


<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxueCA8LSBteV9kYXRhX3NlbGVjdChwYXR0ZXJuID0gXFxYWFhYQ1NcXCwgU2V0ID0gXFxNXFwpXG5teV9kYXRhX3ByZXAoeCkgJT4lIFxuICBtdXRhdGUoZ3JvdXAgPSBmY3RfcmVjb2RlKGdyb3VwLCBcXGNoaW1lcmFcXCA9IFxcbi5mLlxcKSkgJT4lIFxuICBteV9wbG90X2NvbXBvbmVudHMoKVxuZ2dzYXZlKFxcLi4vaW1nLzIwMjQwMjEzLVAySURfQ2ctREJEX1NjLWNvbXBvbmVudHMucG5nXFwsIHdpZHRoID0gOCwgaGVpZ2h0ID0gNSlcbmBgYFxuYGBgIn0= -->

```r
```r
x <- my_data_select(pattern = \XXXXCS\, Set = \M\)
my_data_prep(x) %>% 
  mutate(group = fct_recode(group, \chimera\ = \n.f.\)) %>% 
  my_plot_components()
ggsave(\../img/20240213-P2ID_Cg-DBD_Sc-components.png\, width = 8, height = 5)

<!-- rnb-source-end -->

<!-- rnb-chunk-end -->


<!-- rnb-text-begin -->

my_plot_region_effect_twovar_line(“4”, “5”)# %>% ggplotly() ggsave(“../img/20231221-region-swap-effect-4-on-5.png”, width = 6, height = 4, dpi = 150) my_plot_region_effect_twovar_line(“5”, “4”)# %>% ggplotly() ggsave(“../img/20231224-region-swap-effect-5-on-4.png”, width = 6, height = 4, dpi = 200)


The main plotting functions are now in a separate script file in `../script`. The plotting function below is to adapt the plot for a figure in the paper, simultaneously showing regions 1-3.


<!-- rnb-text-end -->


<!-- rnb-chunk-begin -->


<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxubXlfdXBwZXJfdHJpYW5ndWxhcl9tYXQgPC0gZnVuY3Rpb24oYWx0ID0gXFxDXFwsIHZhciA9IFxcQV9QSE8yXFwsIG5mLmFzLm5hID0gRil7XG4gICMgZ2l2ZW4gdGhlIGFsdGVybmF0aXZlIGFsbGVsZSAoQy9TKSBhbmQgYSB2YXJpYWJsZSBvZiBpbnRlcmVzdCwgZS5nLiwgQV9QSE8yLFxuICAjIG91dHB1dCBhbiB1cHBlciB0cmlhbmd1bGFyIG1hdHJpeCBjb250YWluaW5nIHRoZSB2YWx1ZXMgZnJvbSB0aGUgdmFyaWFibGUgXG4gICMgb2YgaW50ZXJlc3QsIHdpdGggdGhlIHJvdyBhbmQgY29sIG51bWJlcnMgYmFzZWQgb24gdGhlIGZpcnN0IGFuZCBzZWNvbmRcbiAgIyBwb3NpdGlvbnMgY29udGFpbmluZyB0aGUgYWx0ZXJuYXRpdmUgYWxsZWxlLiBJZiBhbGwgcG9zaXRpb25zIGNvbnRhaW4gdGhlIFxuICAjIHJlZmVyZW5jZSBhbGxlbGUsIHRoZSB2YWx1ZSBpcyBzdWJ0cmFjdGVkIGZyb20gYWxsIHZhbHVlcyBpbiB0aGUgbWF0cml4XG4gICMgd2hlbiBqdXN0IG9uZSBwb3NpdGlvbiBpcyB0aGUgYWx0ZXJuYXRpdmUgYWxsZWxlLCB0aGUgdmFsdWUgaW4gdGhlIGRpYWdvbmFsXG4gICMgaXMgc2V0LiB3aGVuIHRoZXJlIGFyZSBtb3JlIHRoYW4gMiByZWdpb25zIGNvbnRhaW5pbmcgdGhlIGFsdGVybmF0aXZlIGFsbGVsZVxuICAjIHNraXAuXG4gICMgaWYgXFxuZi5hcy5uYSA9IFRSVUVcXCwgZXZhbHVhdGUgaWYgdGhlIGFjdGl2aXR5IG9mIGVpdGhlciBvZiB0aGUgdHdvIGNoaW1lcmFzXG4gICMgYmVpbmcgY29tcGFyZWQgaXMgbm9uIGZ1bmN0aW9uYWwuIGlmIHllcywgc2V0IHRoZSBjb3JyZXNwb25kaW5nIG1hdHJpeCB2YWx1ZVxuICAjIHRvIE5BXG4gIG91dF9tYXQgPC0gbWF0cml4KE5BLCBucm93ID0gNSwgbmNvbCA9IDUpXG4gIHJlZl92YWwgPC0gTkFcbiAgZGF0IDwtIGZpbHRlcih4aW1lcmEsIHNldCA9PSBcXE1cXCkgJT4lIFxuICAgIG11dGF0ZShTID0gYXMuY2hhcmFjdGVyKHN5bWJvbCkgJT4lIHRvdXBwZXIoKSlcbiAgaWYobmYuYXMubmEpe1xuICAgIGRhdCA8LSBmaWx0ZXIoZGF0LCBncm91cCAhPSBcXG4uZi5cXClcbiAgfVxuICBmb3IoaSBpbiBzZXEoMSwgbnJvdyhkYXQpKSl7XG4gICAgc3ltYm9sID0gZGF0W2ksIFxcU1xcXVxuICAgICMgZGV0ZXJtaW5lIHdoaWNoIHBvc2l0aW9ucyBjb250YWluIHRoZSBhbHRlcm5hdGl2ZSBhbGxlbGVcbiAgICBwID0gc3RyX2xvY2F0ZV9hbGwoc3ltYm9sLCBhbHQpW1sxXV1bLFxcc3RhcnRcXF1cbiAgICBsID0gbGVuZ3RoKHApICAgIyBob3cgbWFueSBwb3NpdGlvbnMgY29udGFpbiB0aGUgYWx0IGFsbGVsZVxuICAgIHYgPSBkYXRbW3Zhcl1dW2ldICMgcmV0cmlldmUgdGhlIHZhbHVlIG9mIHRoZSB2YXJpYWJsZVxuICAgIGlmKGwgPT0gMClcbiAgICAgIHJlZl92YWwgPSB2XG4gICAgZWxzZSBpZihsID09IDEpXG4gICAgICBvdXRfbWF0W3AsIHBdID0gdlxuICAgIGVsc2UgaWYobCA9PSAyKVxuICAgICAgb3V0X21hdFtwWzFdLCBwWzJdXSA9IHZcbiAgfVxuICBvdXRfbWF0ID0gb3V0X21hdCAtIHJlZl92YWxcbiAgcmV0dXJuKG91dF9tYXQpXG59XG5gYGBcbmBgYCJ9 -->

```r
```r
my_upper_triangular_mat <- function(alt = \C\, var = \A_PHO2\, nf.as.na = F){
  # given the alternative allele (C/S) and a variable of interest, e.g., A_PHO2,
  # output an upper triangular matrix containing the values from the variable 
  # of interest, with the row and col numbers based on the first and second
  # positions containing the alternative allele. If all positions contain the 
  # reference allele, the value is subtracted from all values in the matrix
  # when just one position is the alternative allele, the value in the diagonal
  # is set. when there are more than 2 regions containing the alternative allele
  # skip.
  # if \nf.as.na = TRUE\, evaluate if the activity of either of the two chimeras
  # being compared is non functional. if yes, set the corresponding matrix value
  # to NA
  out_mat <- matrix(NA, nrow = 5, ncol = 5)
  ref_val <- NA
  dat <- filter(ximera, set == \M\) %>% 
    mutate(S = as.character(symbol) %>% toupper())
  if(nf.as.na){
    dat <- filter(dat, group != \n.f.\)
  }
  for(i in seq(1, nrow(dat))){
    symbol = dat[i, \S\]
    # determine which positions contain the alternative allele
    p = str_locate_all(symbol, alt)[[1]][,\start\]
    l = length(p)   # how many positions contain the alt allele
    v = dat[[var]][i] # retrieve the value of the variable
    if(l == 0)
      ref_val = v
    else if(l == 1)
      out_mat[p, p] = v
    else if(l == 2)
      out_mat[p[1], p[2]] = v
  }
  out_mat = out_mat - ref_val
  return(out_mat)
}

<!-- rnb-source-end -->

<!-- rnb-chunk-end -->


<!-- rnb-text-begin -->



<!-- rnb-text-end -->


<!-- rnb-chunk-begin -->


<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxubXlfY29tYmluZWRfdHJpYW5ndWxhcl9tYXQgPC0gZnVuY3Rpb24oYWx0ID0gXFxDXFwpe1xuICAjIGdpdmVuIHRoZSBhbHRlcm5hdGl2ZSBhbGxlbGUgKEMvUyksIG91dHB1dCBhIG1hdHJpeCBjb250YWluaW5nIHRoZSB2YWx1ZXNcbiAgIyBmb3IgYm90aCB3aXRoIGFuZCB3aXRob3V0IFBobzIsIGFycmFuZ2VkIGluIHR3byBjb21wbGVtZW50YXJ5IHRyaWFndWxhclxuICAjIG1hdHJpY2VzLCB3aXRoIHRoZSByb3cgYW5kIGNvbCBudW1iZXJzIGJhc2VkIG9uIHRoZSBmaXJzdCBhbmQgc2Vjb25kXG4gICMgcG9zaXRpb25zIGNvbnRhaW5pbmcgdGhlIGFsdGVybmF0aXZlIGFsbGVsZS4gSWYgYWxsIHBvc2l0aW9ucyBjb250YWluIHRoZSBcbiAgIyByZWZlcmVuY2UgYWxsZWxlLCB0aGUgdmFsdWUgaXMgc3VidHJhY3RlZCBmcm9tIGFsbCB2YWx1ZXMgaW4gdGhlIG1hdHJpeFxuICAjIHdoZW4ganVzdCBvbmUgcG9zaXRpb24gaXMgdGhlIGFsdGVybmF0aXZlIGFsbGVsZSwgdGhlIHZhbHVlIGluIHRoZSBkaWFnb25hbFxuICAjIGlzIHNldC4gd2hlbiB0aGVyZSBhcmUgbW9yZSB0aGFuIDIgcmVnaW9ucyBjb250YWluaW5nIHRoZSBhbHRlcm5hdGl2ZSBhbGxlbGVcbiAgIyBza2lwLlxuICBvdXRfbWF0IDwtIG1hdHJpeChOQSwgbnJvdyA9IDYsIG5jb2wgPSA2KVxuICB1cHBlciA8LSBjYmluZChOQSwgbXlfdXBwZXJfdHJpYW5ndWxhcl9tYXQoYWx0LCB2YXIgPSBcXEFfUEhPMlxcLCApKSAlPiUgXG4gICAgcmJpbmQoLiwgTkEpXG4gIGxvd2VyIDwtIHJiaW5kKE5BLCB0KG15X3VwcGVyX3RyaWFuZ3VsYXJfbWF0KGFsdCwgdmFyID0gXFxBX3BobzJcXCkpKSAlPiUgXG4gICAgY2JpbmQoLiwgTkEpXG4gIG91dF9tYXQgPSBpZmVsc2UoaXMubmEodXBwZXIpLCBsb3dlciwgdXBwZXIpXG4gIHJldHVybihvdXRfbWF0KVxufVxuYGBgXG5gYGAifQ== -->

```r
```r
my_combined_triangular_mat <- function(alt = \C\){
  # given the alternative allele (C/S), output a matrix containing the values
  # for both with and without Pho2, arranged in two complementary triagular
  # matrices, with the row and col numbers based on the first and second
  # positions containing the alternative allele. If all positions contain the 
  # reference allele, the value is subtracted from all values in the matrix
  # when just one position is the alternative allele, the value in the diagonal
  # is set. when there are more than 2 regions containing the alternative allele
  # skip.
  out_mat <- matrix(NA, nrow = 6, ncol = 6)
  upper <- cbind(NA, my_upper_triangular_mat(alt, var = \A_PHO2\, )) %>% 
    rbind(., NA)
  lower <- rbind(NA, t(my_upper_triangular_mat(alt, var = \A_pho2\))) %>% 
    cbind(., NA)
  out_mat = ifelse(is.na(upper), lower, upper)
  return(out_mat)
}

<!-- rnb-source-end -->

<!-- rnb-chunk-end -->


<!-- rnb-text-begin -->



<!-- rnb-text-end -->


<!-- rnb-chunk-begin -->


<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxubXlfcGxvdF90cmlhbmdsZV9oZWF0bWFwIDwtIGZ1bmN0aW9uKGFsdCwgdmFyKXtcbiAgIyB0aGlzIGZ1bmN0aW9uIHRha2VzIHRoZSBvdXRwdXQgb2YgdGhlIGZ1bmN0aW9uIGFib3ZlIGFuZCBtYWtlcyBhIGhlYXRtYXBcbiAgIyB1c2luZyBwaGVhdG1hcCBmdW5jdGlvbiwgdGhlbiByb3RhdGVzIGl0IHVzaW5nIGdyaWQgZ3JhcGhpY3NcbiAgIyB0aGFua3MgdG8gaHR0cHM6Ly9ib29rZG93bi5vcmcvcmRwZW5nL1JQcm9nREEvdGhlLWdyaWQtcGFja2FnZS5odG1sI2dyaWQtZ3JhcGhpY3MtY29vcmRpbmF0ZS1zeXN0ZW1zXG4gICMgYWRkaW5nIHRpdGxlIGJhc2VkIG9uIGh0dHBzOi8vZGF2ZXRhbmcuZ2l0aHViLmlvL211c2UvcGhlYXRtYXAuaHRtbFxuICBcbiAgIyBjb25zdHJ1Y3QgdGl0bGUgb2YgcGxvdFxuICByZWYgPSBpZmVsc2UoYWx0ID09IFxcQ1xcLCBcXFNjUGhvNFxcLCBcXENnUGhvNFxcKVxuICBiZyA9IGlmZWxzZSh2YXIgPT0gXFxBX1BITzJcXCwgXFx3aXRoIFBITzJcXCwgXFx3L28gcGhvMlxcKVxuICBteV90aXRsZSA8LSBwYXN0ZShcXEVwaXN0YXNpcyBiZXR3ZWVuIHJlZ2lvbnMgb25cXCwgcmVmLCBcXGJhY2tncm91bmRcXCwgYmcpXG4gIHRlc3QgPC0gbXlfdXBwZXJfdHJpYW5ndWxhcl9tYXQoYWx0ID0gYWx0LCB2YXIgPSB2YXIpXG4gIHBhbGV0dGVMZW5ndGggPSA1MFxuICBteUNvbG9ycyA8LSBjb2xvclJhbXBQYWxldHRlKGMoXFxzdGVlbGJsdWUzXFwsIFxcZ3JheTkwXFwsIFxccmVkXFwpKShwYWxldHRlTGVuZ3RoKVxuICBybmcgPC0gbWF4KGFicyh0ZXN0KSwgbmEucm0gPSBUUlVFKVxuICBteUJyZWFrcyA8LSBjKHNlcSgtcm5nLCAwLCBsZW5ndGgub3V0PWNlaWxpbmcocGFsZXR0ZUxlbmd0aC8yKSArIDEpLCBcbiAgICAgICAgICAgICAgICBzZXEocm5nL3BhbGV0dGVMZW5ndGgsIHJuZyxcbiAgICAgICAgICAgICAgICAgICAgbGVuZ3RoLm91dD1mbG9vcihwYWxldHRlTGVuZ3RoLzIpKSlcbiAgcCA8LSBwaGVhdG1hcDo6cGhlYXRtYXAodGVzdCwgY29sb3IgPSBteUNvbG9ycywgYnJlYWtzID0gbXlCcmVha3MsXG4gICAgICAgICAgICAgICAgICAgICAgICAgIGJvcmRlcl9jb2xvciA9IE5BLCBuYV9jb2wgPSBOQSwgc2lsZW50ID0gVFJVRSxcbiAgICAgICAgICAgICAgICAgICAgICAgICAgY2x1c3Rlcl9jb2xzID0gRkFMU0UsIGNsdXN0ZXJfcm93cyA9IEZBTFNFKVxuICB2cCA8LSB2aWV3cG9ydCh4ID0gMC41LCB5ID0gMC4yNSxcbiAgICAgICAgICAgICAgICAgd2lkdGggPSB1bml0KDQuNSwgXFxpblxcKSwgaGVpZ2h0ID0gdW5pdCg0LjUsIFxcaW5cXCksIGFuZ2xlID0gNDcpIFxuICBncmlkLm5ld3BhZ2UoKVxuICBwdXNoVmlld3BvcnQodnApXG4gIGdyaWQuZHJhdyhwJGd0YWJsZSlcbiAgcG9wVmlld3BvcnQoKVxuICBncmlkLnRleHQobGFiZWwgPSBteV90aXRsZSwgeCA9IDAuNSwgeSA9IDAuOTUsIGdwID0gZ3Bhcihmb250c2l6ZSA9IDE2LCBmb250ZmFjZSA9IFxcYm9sZFxcKSlcbiAgcmV0dXJuKHApXG59XG5gYGBcbmBgYCJ9 -->

```r
```r
my_plot_triangle_heatmap <- function(alt, var){
  # this function takes the output of the function above and makes a heatmap
  # using pheatmap function, then rotates it using grid graphics
  # thanks to https://bookdown.org/rdpeng/RProgDA/the-grid-package.html#grid-graphics-coordinate-systems
  # adding title based on https://davetang.github.io/muse/pheatmap.html
  
  # construct title of plot
  ref = ifelse(alt == \C\, \ScPho4\, \CgPho4\)
  bg = ifelse(var == \A_PHO2\, \with PHO2\, \w/o pho2\)
  my_title <- paste(\Epistasis between regions on\, ref, \background\, bg)
  test <- my_upper_triangular_mat(alt = alt, var = var)
  paletteLength = 50
  myColors <- colorRampPalette(c(\steelblue3\, \gray90\, \red\))(paletteLength)
  rng <- max(abs(test), na.rm = TRUE)
  myBreaks <- c(seq(-rng, 0, length.out=ceiling(paletteLength/2) + 1), 
                seq(rng/paletteLength, rng,
                    length.out=floor(paletteLength/2)))
  p <- pheatmap::pheatmap(test, color = myColors, breaks = myBreaks,
                          border_color = NA, na_col = NA, silent = TRUE,
                          cluster_cols = FALSE, cluster_rows = FALSE)
  vp <- viewport(x = 0.5, y = 0.25,
                 width = unit(4.5, \in\), height = unit(4.5, \in\), angle = 47) 
  grid.newpage()
  pushViewport(vp)
  grid.draw(p$gtable)
  popViewport()
  grid.text(label = my_title, x = 0.5, y = 0.95, gp = gpar(fontsize = 16, fontface = \bold\))
  return(p)
}

<!-- rnb-source-end -->

<!-- rnb-chunk-end -->


<!-- rnb-text-begin -->


## P2ID:Cg_DBD:Sc fail

Highlight the subset of the chimeras with P2ID:Cg + DBD:Sc, most of which are non functional


<!-- rnb-text-end -->


<!-- rnb-chunk-begin -->


<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxubXlfcGxvdF9jb21iaW5lZF90cmlhbmdsZV9oZWF0bWFwIDwtIGZ1bmN0aW9uKGFsdCl7XG4gICMgdGhpcyBmdW5jdGlvbiB0YWtlcyB0aGUgb3V0cHV0IG9mIHRoZSBmdW5jdGlvbiBteV9jb21iaW5lZF90cmlhbmd1bGFyX21hdCgpXG4gICMgdXNpbmcgcGhlYXRtYXAgZnVuY3Rpb24sIHRoZW4gcm90YXRlcyBpdCB1c2luZyBncmlkIGdyYXBoaWNzXG4gICMgdGhhbmtzIHRvIGh0dHBzOi8vYm9va2Rvd24ub3JnL3JkcGVuZy9SUHJvZ0RBL3RoZS1ncmlkLXBhY2thZ2UuaHRtbCNncmlkLWdyYXBoaWNzLWNvb3JkaW5hdGUtc3lzdGVtc1xuICAjIGFkZGluZyB0aXRsZSBiYXNlZCBvbiBodHRwczovL2RhdmV0YW5nLmdpdGh1Yi5pby9tdXNlL3BoZWF0bWFwLmh0bWxcbiAgXG4gICMgY29uc3RydWN0IHRpdGxlIG9mIHBsb3RcbiAgcmVmID0gaWZlbHNlKGFsdCA9PSBcXENcXCwgXFxTY1BobzRcXCwgXFxDZ1BobzRcXClcbiAgbXlfdGl0bGUgPC0gcGFzdGUoXFxFcGlzdGFzaXMgYmV0d2VlbiByZWdpb25zIG9uXFwsIHJlZiwgXFxiYWNrZ3JvdW5kXFwpXG4gIHRlc3QgPC0gbXlfY29tYmluZWRfdHJpYW5ndWxhcl9tYXQoYWx0ID0gYWx0KVxuICBwYWxldHRlTGVuZ3RoID0gNTBcbiAgbXlDb2xvcnMgPC0gY29sb3JSYW1wUGFsZXR0ZShjKFxcc3RlZWxibHVlXFwsIFxcZ3JheTkwXFwsIFxccmVkXFwpKShwYWxldHRlTGVuZ3RoKVxuICBybmcgPC0gbWF4KGFicyh0ZXN0KSwgbmEucm0gPSBUUlVFKVxuICBteUJyZWFrcyA8LSBjKHNlcSgtcm5nLCAwLCBsZW5ndGgub3V0PWNlaWxpbmcocGFsZXR0ZUxlbmd0aC8yKSArIDEpLCBcbiAgICAgICAgICAgICAgICBzZXEocm5nL3BhbGV0dGVMZW5ndGgsIHJuZyxcbiAgICAgICAgICAgICAgICAgICAgbGVuZ3RoLm91dD1mbG9vcihwYWxldHRlTGVuZ3RoLzIpKSlcbiAgcCA8LSBwaGVhdG1hcDo6cGhlYXRtYXAodGVzdCwgY29sb3IgPSBteUNvbG9ycywgYnJlYWtzID0gbXlCcmVha3MsXG4gICAgICAgICAgICAgICAgICAgICAgICAgIGJvcmRlcl9jb2xvciA9IE5BLCBuYV9jb2wgPSBOQSwgc2lsZW50ID0gVFJVRSxcbiAgICAgICAgICAgICAgICAgICAgICAgICAgY2x1c3Rlcl9jb2xzID0gRkFMU0UsIGNsdXN0ZXJfcm93cyA9IEZBTFNFKVxuICB2cCA8LSB2aWV3cG9ydCh4ID0gMC41LCB5ID0gMC40NSxcbiAgICAgICAgICAgICAgICAgd2lkdGggPSB1bml0KDMsIFxcaW5cXCksIGhlaWdodCA9IHVuaXQoMi44LCBcXGluXFwpLCBhbmdsZSA9IDQ3KSBcbiAgZ3JpZC5uZXdwYWdlKClcbiAgcHVzaFZpZXdwb3J0KHZwKVxuICBncmlkLmRyYXcocCRndGFibGUpXG4gIHBvcFZpZXdwb3J0KClcbiAgZ3JpZC50ZXh0KGxhYmVsID0gbXlfdGl0bGUsIHggPSAwLjUsIHkgPSAwLjk1LCBcbiAgICAgICAgICAgIGdwID0gZ3Bhcihmb250c2l6ZSA9IDE2LCBmb250ZmFjZSA9IFxcYm9sZFxcKSlcbiAgZ3JpZC50ZXh0KGxhYmVsID0gXFxXaXRoIFBobzJcXCwgeCA9IDAuMSwgeSA9IDAuNjUsIGp1c3QgPSBjKFxcbGVmdFxcLCBcXHRvcFxcKSxcbiAgICAgICAgICAgIGdwID0gZ3Bhcihmb250c2l6ZSA9IDE0LCBmb250ZmFjZSA9IFxcYm9sZFxcKSlcbiAgZ3JpZC50ZXh0KGxhYmVsID0gXFxXaXRob3V0IHBobzJcXCwgeCA9IDAuMSwgeSA9IDAuMjUsIGp1c3QgPSBjKFxcbGVmdFxcLCBcXHRvcFxcKSwgXG4gICAgICAgICAgICBncCA9IGdwYXIoZm9udHNpemUgPSAxNCwgZm9udGZhY2UgPSBcXGJvbGRcXCkpXG4gIHJldHVybihwKVxufVxuYGBgXG5gYGAifQ== -->

```r
```r
my_plot_combined_triangle_heatmap <- function(alt){
  # this function takes the output of the function my_combined_triangular_mat()
  # using pheatmap function, then rotates it using grid graphics
  # thanks to https://bookdown.org/rdpeng/RProgDA/the-grid-package.html#grid-graphics-coordinate-systems
  # adding title based on https://davetang.github.io/muse/pheatmap.html
  
  # construct title of plot
  ref = ifelse(alt == \C\, \ScPho4\, \CgPho4\)
  my_title <- paste(\Epistasis between regions on\, ref, \background\)
  test <- my_combined_triangular_mat(alt = alt)
  paletteLength = 50
  myColors <- colorRampPalette(c(\steelblue\, \gray90\, \red\))(paletteLength)
  rng <- max(abs(test), na.rm = TRUE)
  myBreaks <- c(seq(-rng, 0, length.out=ceiling(paletteLength/2) + 1), 
                seq(rng/paletteLength, rng,
                    length.out=floor(paletteLength/2)))
  p <- pheatmap::pheatmap(test, color = myColors, breaks = myBreaks,
                          border_color = NA, na_col = NA, silent = TRUE,
                          cluster_cols = FALSE, cluster_rows = FALSE)
  vp <- viewport(x = 0.5, y = 0.45,
                 width = unit(3, \in\), height = unit(2.8, \in\), angle = 47) 
  grid.newpage()
  pushViewport(vp)
  grid.draw(p$gtable)
  popViewport()
  grid.text(label = my_title, x = 0.5, y = 0.95, 
            gp = gpar(fontsize = 16, fontface = \bold\))
  grid.text(label = \With Pho2\, x = 0.1, y = 0.65, just = c(\left\, \top\),
            gp = gpar(fontsize = 14, fontface = \bold\))
  grid.text(label = \Without pho2\, x = 0.1, y = 0.25, just = c(\left\, \top\), 
            gp = gpar(fontsize = 14, fontface = \bold\))
  return(p)
}

<!-- rnb-source-end -->

<!-- rnb-chunk-end -->


<!-- rnb-text-begin -->



<!-- rnb-text-end -->


<!-- rnb-chunk-begin -->


<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxucG5nKFxcLi4vaW1nLzIwMjQwMTE1LXRyaWFuZ2xlLWhlYXRtYXAtQ2dQaG80LXJlZi5wbmdcXCwgd2lkdGggPSA3LCBoZWlnaHQgPSA1LCB1bml0cyA9IFxcaW5cXCwgcmVzID0gMzAwKVxucDEgPC0gbXlfcGxvdF9jb21iaW5lZF90cmlhbmdsZV9oZWF0bWFwKFxcQ1xcKVxuZGV2Lm9mZigpXG5wbmcoXFwuLi9pbWcvMjAyNDAxMTUtdHJpYW5nbGUtaGVhdG1hcC1TY1BobzQtcmVmLnBuZ1xcLCB3aWR0aCA9IDcsIGhlaWdodCA9IDUsIHVuaXRzID0gXFxpblxcLCByZXMgPSAzMDApXG5wMiA8LSBteV9wbG90X2NvbWJpbmVkX3RyaWFuZ2xlX2hlYXRtYXAoXFxTXFwpXG5kZXYub2ZmKClcbmBgYFxuYGBgIn0= -->

```r
```r
png(\../img/20240115-triangle-heatmap-CgPho4-ref.png\, width = 7, height = 5, units = \in\, res = 300)
p1 <- my_plot_combined_triangle_heatmap(\C\)
dev.off()
png(\../img/20240115-triangle-heatmap-ScPho4-ref.png\, width = 7, height = 5, units = \in\, res = 300)
p2 <- my_plot_combined_triangle_heatmap(\S\)
dev.off()

<!-- rnb-source-end -->

<!-- rnb-chunk-end -->


<!-- rnb-text-begin -->


## Triangle heatmap

First, write a function to generate the data for plotting. If we are going to use ggplot, we need a tibble to store the data, something in the following form

| plasmid | symbol | RegionA | RegionB | A_PHO2 | A_pho2 | rA_PHO2 | boost | perc_pho2 |
|:--------|:-------|:--------|:--------|:-------|:-------|:--------|:------|:----------|
| 209     | CCSCC  | 3       | 3       | 8.25   | 7.82   | 0.468   | 1.06  | 0.94      |

If we are ok with using non ggplot - heatmaps are not ggplot's strength anyways - we can just build a matrix.

Note that this way of summarizing the data has many limitaitons: 1) it requires specifying the reference, either CCCCC or SSSSS. Everything is measured against that; 2) it only shows pairwise (two region) interactions. This turns out to be fine with five regions, since every chimera can be expressed as either a 0, 1 or 2 region swap from one of the two reference genotypes. With 6 or more regions, higher level (3 or more region) interactions cannot be visualized this way. Because of this, we will focus on just the main set for this analysis.

To build the matrix, we need to first identify the chimeras that belong to the set. For that, we will use the "main" set, with the five region split, for the moment at least. The function will first determine which reference to use. If we use SSSSS as the reference, for example, we will assign 0 to the reference. All other chimeras with 1 or 2 regions from Cg will be used to fill an upper triangular matrix, using one of the values of interest, e.g., A_PHO2.


<!-- rnb-text-end -->


<!-- rnb-chunk-begin -->


<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxubXlfdXBwZXJfdHJpYW5ndWxhcl9tYXQgPC0gZnVuY3Rpb24oYWx0ID0gXCJDXCIsIHZhciA9IFwiQV9QSE8yXCIsIG5mLmFzLm5hID0gRil7XG4gICMgZ2l2ZW4gdGhlIGFsdGVybmF0aXZlIGFsbGVsZSAoQy9TKSBhbmQgYSB2YXJpYWJsZSBvZiBpbnRlcmVzdCwgZS5nLiwgQV9QSE8yLFxuICAjIG91dHB1dCBhbiB1cHBlciB0cmlhbmd1bGFyIG1hdHJpeCBjb250YWluaW5nIHRoZSB2YWx1ZXMgZnJvbSB0aGUgdmFyaWFibGUgXG4gICMgb2YgaW50ZXJlc3QsIHdpdGggdGhlIHJvdyBhbmQgY29sIG51bWJlcnMgYmFzZWQgb24gdGhlIGZpcnN0IGFuZCBzZWNvbmRcbiAgIyBwb3NpdGlvbnMgY29udGFpbmluZyB0aGUgYWx0ZXJuYXRpdmUgYWxsZWxlLiBJZiBhbGwgcG9zaXRpb25zIGNvbnRhaW4gdGhlIFxuICAjIHJlZmVyZW5jZSBhbGxlbGUsIHRoZSB2YWx1ZSBpcyBzdWJ0cmFjdGVkIGZyb20gYWxsIHZhbHVlcyBpbiB0aGUgbWF0cml4XG4gICMgd2hlbiBqdXN0IG9uZSBwb3NpdGlvbiBpcyB0aGUgYWx0ZXJuYXRpdmUgYWxsZWxlLCB0aGUgdmFsdWUgaW4gdGhlIGRpYWdvbmFsXG4gICMgaXMgc2V0LiB3aGVuIHRoZXJlIGFyZSBtb3JlIHRoYW4gMiByZWdpb25zIGNvbnRhaW5pbmcgdGhlIGFsdGVybmF0aXZlIGFsbGVsZVxuICAjIHNraXAuXG4gICMgaWYgXCJuZi5hcy5uYSA9IFRSVUVcIiwgZXZhbHVhdGUgaWYgdGhlIGFjdGl2aXR5IG9mIGVpdGhlciBvZiB0aGUgdHdvIGNoaW1lcmFzXG4gICMgYmVpbmcgY29tcGFyZWQgaXMgbm9uIGZ1bmN0aW9uYWwuIGlmIHllcywgc2V0IHRoZSBjb3JyZXNwb25kaW5nIG1hdHJpeCB2YWx1ZVxuICAjIHRvIE5BXG4gIG91dF9tYXQgPC0gbWF0cml4KE5BLCBucm93ID0gNSwgbmNvbCA9IDUpXG4gIHJlZl92YWwgPC0gTkFcbiAgZGF0IDwtIGZpbHRlcih4aW1lcmEsIHNldCA9PSBcIk1cIikgJT4lIFxuICAgIG11dGF0ZShTID0gYXMuY2hhcmFjdGVyKHN5bWJvbCkgJT4lIHRvdXBwZXIoKSlcbiAgaWYobmYuYXMubmEpe1xuICAgIGRhdCA8LSBmaWx0ZXIoZGF0LCBncm91cCAhPSBcIm4uZi5cIilcbiAgfVxuICBmb3IoaSBpbiBzZXEoMSwgbnJvdyhkYXQpKSl7XG4gICAgc3ltYm9sID0gZGF0W2ksIFwiU1wiXVxuICAgICMgZGV0ZXJtaW5lIHdoaWNoIHBvc2l0aW9ucyBjb250YWluIHRoZSBhbHRlcm5hdGl2ZSBhbGxlbGVcbiAgICBwID0gc3RyX2xvY2F0ZV9hbGwoc3ltYm9sLCBhbHQpW1sxXV1bLFwic3RhcnRcIl1cbiAgICBsID0gbGVuZ3RoKHApICAgIyBob3cgbWFueSBwb3NpdGlvbnMgY29udGFpbiB0aGUgYWx0IGFsbGVsZVxuICAgIHYgPSBkYXRbW3Zhcl1dW2ldICMgcmV0cmlldmUgdGhlIHZhbHVlIG9mIHRoZSB2YXJpYWJsZVxuICAgIGlmKGwgPT0gMClcbiAgICAgIHJlZl92YWwgPSB2XG4gICAgZWxzZSBpZihsID09IDEpXG4gICAgICBvdXRfbWF0W3AsIHBdID0gdlxuICAgIGVsc2UgaWYobCA9PSAyKVxuICAgICAgb3V0X21hdFtwWzFdLCBwWzJdXSA9IHZcbiAgfVxuICBvdXRfbWF0ID0gb3V0X21hdCAtIHJlZl92YWxcbiAgcmV0dXJuKG91dF9tYXQpXG59XG5gYGAifQ== -->

```r
my_upper_triangular_mat <- function(alt = "C", var = "A_PHO2", nf.as.na = F){
  # given the alternative allele (C/S) and a variable of interest, e.g., A_PHO2,
  # output an upper triangular matrix containing the values from the variable 
  # of interest, with the row and col numbers based on the first and second
  # positions containing the alternative allele. If all positions contain the 
  # reference allele, the value is subtracted from all values in the matrix
  # when just one position is the alternative allele, the value in the diagonal
  # is set. when there are more than 2 regions containing the alternative allele
  # skip.
  # if "nf.as.na = TRUE", evaluate if the activity of either of the two chimeras
  # being compared is non functional. if yes, set the corresponding matrix value
  # to NA
  out_mat <- matrix(NA, nrow = 5, ncol = 5)
  ref_val <- NA
  dat <- filter(ximera, set == "M") %>% 
    mutate(S = as.character(symbol) %>% toupper())
  if(nf.as.na){
    dat <- filter(dat, group != "n.f.")
  }
  for(i in seq(1, nrow(dat))){
    symbol = dat[i, "S"]
    # determine which positions contain the alternative allele
    p = str_locate_all(symbol, alt)[[1]][,"start"]
    l = length(p)   # how many positions contain the alt allele
    v = dat[[var]][i] # retrieve the value of the variable
    if(l == 0)
      ref_val = v
    else if(l == 1)
      out_mat[p, p] = v
    else if(l == 2)
      out_mat[p[1], p[2]] = v
  }
  out_mat = out_mat - ref_val
  return(out_mat)
}
my_combined_triangular_mat <- function(alt = "C"){
  # given the alternative allele (C/S), output a matrix containing the values
  # for both with and without Pho2, arranged in two complementary triagular
  # matrices, with the row and col numbers based on the first and second
  # positions containing the alternative allele. If all positions contain the 
  # reference allele, the value is subtracted from all values in the matrix
  # when just one position is the alternative allele, the value in the diagonal
  # is set. when there are more than 2 regions containing the alternative allele
  # skip.
  out_mat <- matrix(NA, nrow = 6, ncol = 6)
  upper <- cbind(NA, my_upper_triangular_mat(alt, var = "A_PHO2", )) %>% 
    rbind(., NA)
  lower <- rbind(NA, t(my_upper_triangular_mat(alt, var = "A_pho2"))) %>% 
    cbind(., NA)
  out_mat = ifelse(is.na(upper), lower, upper)
  return(out_mat)
}
my_plot_triangle_heatmap <- function(alt, var){
  # this function takes the output of the function above and makes a heatmap
  # using pheatmap function, then rotates it using grid graphics
  # thanks to https://bookdown.org/rdpeng/RProgDA/the-grid-package.html#grid-graphics-coordinate-systems
  # adding title based on https://davetang.github.io/muse/pheatmap.html
  
  # construct title of plot
  ref = ifelse(alt == "C", "ScPho4", "CgPho4")
  bg = ifelse(var == "A_PHO2", "with PHO2", "w/o pho2")
  my_title <- paste("Epistasis between regions on", ref, "background", bg)
  test <- my_upper_triangular_mat(alt = alt, var = var)
  paletteLength = 50
  myColors <- colorRampPalette(c("steelblue3", "gray90", "red"))(paletteLength)
  rng <- max(abs(test), na.rm = TRUE)
  myBreaks <- c(seq(-rng, 0, length.out=ceiling(paletteLength/2) + 1), 
                seq(rng/paletteLength, rng,
                    length.out=floor(paletteLength/2)))
  p <- pheatmap::pheatmap(test, color = myColors, breaks = myBreaks,
                          border_color = NA, na_col = NA, silent = TRUE,
                          cluster_cols = FALSE, cluster_rows = FALSE)
  vp <- viewport(x = 0.5, y = 0.25,
                 width = unit(4.5, "in"), height = unit(4.5, "in"), angle = 47) 
  grid.newpage()
  pushViewport(vp)
  grid.draw(p$gtable)
  popViewport()
  grid.text(label = my_title, x = 0.5, y = 0.95, gp = gpar(fontsize = 16, fontface = "bold"))
  return(p)
}
my_plot_combined_triangle_heatmap <- function(alt){
  # this function takes the output of the function my_combined_triangular_mat()
  # using pheatmap function, then rotates it using grid graphics
  # thanks to https://bookdown.org/rdpeng/RProgDA/the-grid-package.html#grid-graphics-coordinate-systems
  # adding title based on https://davetang.github.io/muse/pheatmap.html
  
  # construct title of plot
  ref = ifelse(alt == "C", "ScPho4", "CgPho4")
  my_title <- paste("Epistasis between regions on", ref, "background")
  test <- my_combined_triangular_mat(alt = alt)
  paletteLength = 50
  myColors <- colorRampPalette(c("steelblue", "gray90", "red"))(paletteLength)
  rng <- max(abs(test), na.rm = TRUE)
  myBreaks <- c(seq(-rng, 0, length.out=ceiling(paletteLength/2) + 1), 
                seq(rng/paletteLength, rng,
                    length.out=floor(paletteLength/2)))
  p <- pheatmap::pheatmap(test, color = myColors, breaks = myBreaks,
                          border_color = NA, na_col = NA, silent = TRUE,
                          cluster_cols = FALSE, cluster_rows = FALSE)
  vp <- viewport(x = 0.5, y = 0.45,
                 width = unit(3, "in"), height = unit(2.8, "in"), angle = 47) 
  grid.newpage()
  pushViewport(vp)
  grid.draw(p$gtable)
  popViewport()
  grid.text(label = my_title, x = 0.5, y = 0.95, 
            gp = gpar(fontsize = 16, fontface = "bold"))
  grid.text(label = "With Pho2", x = 0.1, y = 0.65, just = c("left", "top"),
            gp = gpar(fontsize = 14, fontface = "bold"))
  grid.text(label = "Without pho2", x = 0.1, y = 0.25, just = c("left", "top"), 
            gp = gpar(fontsize = 14, fontface = "bold"))
  return(p)
}
png("../img/20240115-triangle-heatmap-CgPho4-ref.png", width = 7, height = 5, units = "in", res = 300)
p1 <- my_plot_combined_triangle_heatmap("C")
dev.off()
png("../img/20240115-triangle-heatmap-ScPho4-ref.png", width = 7, height = 5, units = "in", res = 300)
p2 <- my_plot_combined_triangle_heatmap("S")
dev.off()
---
title: "E013 Pho4 chimera activity analysis using PHO5 reporter, analysis"
author: "Bin He"
date: "2023-10-31 updated `r Sys.Date()`"
output:
  html_notebook:
    toc: yes
    toc_float: yes
    code_folding: hide
  pdf_document:
    toc: yes
  html_document:
    toc: yes
    df_print: paged
---

```{r message=FALSE}
require(plotly)
require(tidyverse)
require(ggridges)
require(cowplot)
require(RColorBrewer)
require(grid)
require(ggtext)
```

```{r}
old <- theme_set(theme_bw(base_size = 16))
```

This is the second part of the analysis. In the first part (see `../input/PHO5-data/20231019-pool-qc-PHO5.Rmd`), I did QC and exported the filtered dataset. Here, I will continue working with that dataset and answer our biological questions.

# Goal

-   Analyze the full chimera set flow results for *PHO5pr*-mCherry reporter.
-   Develop an analysis pipeline to perform QC, correction (if needed) and plotting the results.

# Data

Import the background subtracted data

```{r}
dat0 <- read_tsv("../input/20231023-PHO5-bg-subtracted-data.tsv", col_types = "ccccdddddc")
```

Filter the data

```{r}
dat <- filter(dat0, host != "PHO84", flag == "pass", date != "02/10") %>% 
  # based on previous QC, the following sample (both replicates) have high
  # variance - one biological replicate is highly expressed, while the other 
  # two have mNeon, but barely any RFP expression.
  mutate(
    host = fct_recode(host, pho2 = "pho2∆"),
    flag = ifelse(plasmid == "233" & host == "pho2", "high.var", flag))
```

Number of replicates left for each sample

```{r}
expt <- dat %>% 
  filter(host %in% c("PHO2", "pho2"), !plasmid %in% c("188", "194")) %>% 
  group_by(date, plasmid, host) %>% 
  summarize(n = n(), .groups = "drop")

expt %>% 
  ggplot(aes(x = plasmid, y = n)) +
  geom_col(aes(fill = host)) + 
  facet_grid(date ~ .) +
  scale_fill_manual(values = c("PHO2" = "gray30", "pho2" = "gray70")) +
  theme_minimal() + background_grid(major = "none") + panel_border(size = 0.5) +
  scale_y_continuous(name = "Replicates", breaks = c(6)) + xlab(NULL) +
  theme(axis.text.x = element_text(angle = 90),
        strip.text.y = element_text(angle = 0),
        legend.position = "top")
```

Chimera makeup information

```{r}
meta <- read_tsv("../input/20230208-chimera-Pho4-makeup.txt", col_types = "ccccc")
```

## Summarize data

Here we would like calculate the ratio of RFP/GFP for each chimera (plasmid) across all replicates, including from different days. Note that the parameter of interest is a ratio, which can be estimated using either "means of ratios" or "ratios of means". These are just two specific instances of a more general estimator, representing two choices of the weights. The "means of ratios" first calculates the ratios for each replicate within a plasmid, then average them. In this calculation, each replicate is given the weight of 1/n (equal). The "ratios of means" first sum up the GFP and RFP values separately across the replicates for each plasmid, then take the ratio between them. In this estimator, the weight for each replicate is x / sum(x), where x is the denominator in the ratio, i.e., GFP. In other words, this estimator will give more weights to the replicates where the chimera had a higher expression level.

Both estimators are known to be biased. We will ignore that for the moment. In terms of a choice between the two, it seems that there is no reason to give more weights to the experiments with a higher GFP signal. So, the "means of ratios" seems a more natural choice. However, we will calcultae both and decide later.

A final question is how to calculate the variance of the ratio estimate. According to the `survey` package [manual](https://rstudio-pubs-static.s3.amazonaws.com/178965_fb60a0f7bbb44a6ea219713fb1a89a22.html), an approximate estimator for the variance is

$$
r = \frac{\bar{y}}{\bar{x}}, \text{where}\ \bar{y}=\frac{1}{n}\sum_{i=1}^{n}y_i\ \text{and}\ \bar{x}=\frac{1}{n}\sum_{i=1}^{n}x_i\ \\
\hat{V}(r) = (1-\frac{n}{N})(\frac{1}{\bar{x}^2})\frac{s_r^2}{n}\ \text{where}\ s_r^2=\frac{1}{n-1}\sum_{i=1}^{n}(y_i-rx_i)^2
$$

Assuming that N\>\>n, we can ignore the first term in the variance estimator. The rest can be calculated from the data

```{r}
datsum <- dat %>%
  filter(!is.na(plasmid)) %>% 
  group_by(plasmid, host) %>% 
  summarize(
     n = n(),
    mG = mean(BL1.H),
    mR = mean(YL2.H),
     A = mean(YL2.H/BL1.H),
     r = mR/mG,
    s2 = 1/(n-1)*sum((YL2.H - r*BL1.H)^2),
    vr = 1/(mG^2)*s2/n,
    se = sqrt(vr),
    .groups = "drop"
  ) %>% 
  select(-s2, -r, -vr)# %>% 
  #pivot_wider(names_from = host, values_from = BL1.H:`nR/G`) %>% 
  #mutate(`pho2∆/PHO2` = `R/G_pho2∆`/`R/G_PHO2`,
  #       `n.pho2∆/PHO2` = `nR/G_pho2∆`/`nR/G_PHO2`)
```

For each chimera, we would also like to calculate **three values**:

1.  A in *pho2∆*: this is its base activity without Pho2
2.  A in *PHO2*: this is its full activity with Pho2
3.  A_PHO2 / A_pho2∆: this is the Pho2 enhancement of activity

We assign the chimeras into several groups, based on their A_PHO2 and A_PHO2/A_pho2∆

```{r}
ximera <- datsum %>%
  pivot_wider(id_cols = plasmid, names_from = host,
              values_from = c(A, se)) %>% 
  mutate(
    rA_PHO2 = A_PHO2 / A_PHO2[plasmid == "194"],
    rA_pho2 = A_pho2 / A_pho2[plasmid == "194"],
    boost = A_PHO2 / A_pho2,
    group = case_when(
      plasmid %in% c("188", "194") ~ "ref",
      rA_PHO2 < 0.2                ~ "n.f.",
      .default = "chimera"
    ),
    group = fct_relevel(group, "ref", "chimera", "n.f.")
  ) %>% 
  left_join(select(meta, plasmid, set, symbol), by = "plasmid") %>% 
  mutate(symbol = fct_reorder(symbol, rA_PHO2, .desc = TRUE)) %>% 
  relocate(c(set, symbol, group), .after = plasmid)
```

Export the summarized data

```{r}
write_tsv(ximera, file = "../output/20231125-PHO5pr-chimera-summarized.tsv")
```

To be able to plot all the data points, let's generate another data frame with the individual ratios.

```{r}
dat_sep <- dat %>%
  filter(!is.na(plasmid)) %>% 
  mutate(A = YL2.H/BL1.H) %>% 
  select(plasmid, host, BL1.H, YL2.H, A, flag) %>% 
  left_join(select(meta, plasmid, set, symbol), by = "plasmid")# %>% 
  #mutate(symbol = fct_reorder(symbol, rA_PHO2, .desc = TRUE))# %>% 
```

# Analysis

## Plotting functions

<!---
Set up common parameters for thresholding and plotting
```{r}
# reference Pho4 plasmid ids
refs <- c("188", "194")
# colors
date.colors = c(brewer.pal(name="Dark2", n = 8), brewer.pal(name="Paired", n = 8))
host.colors = c("PHO2" = "gray30", "pho2" = "gray70")
point.colors = c("PHO2" = "forestgreen", "pho2" = "purple4")
# 
```
--->

Source the scripts

```{r}
source("../script/20240211-chimera-data-plotting-functions.R")
```

Modify the component plotting function for special purposes

```{r}
my_plot_ratio <- function(selection){
  # custom colors for this function
  date.colors = c(brewer.pal(name="Dark2", n = 8), brewer.pal(name="Paired", n = 8))
  host.colors = c("PHO2" = "gray30", "pho2" = "gray70")
  point.colors = c("PHO2" = "forestgreen", "pho2" = "purple4")
  # prepare data
  tmp <- my_data_prep(selection)
  # plotting
  p <- tmp %>% 
    select(-c(FSC.H, nGFP, nRFP, flag)) %>% 
    mutate(`R/G` = YL2.H/BL1.H) %>% 
    pivot_longer(cols = c(BL1.H, YL2.H, `R/G`), 
                 names_to = "parameter", values_to = "value") %>% 
    mutate(parameter = factor(parameter, levels = c("R/G", "YL2.H", "BL1.H"),
                              labels = c("RFP/GFP", "PHO5pRFP", "Pho4-GFP"))) %>% 
    ggplot(aes(x = symbol, y = value, group = host)) + 
    stat_summary(aes(group = host), fun.data = "mean_cl_boot", geom = "errorbar",
                 position = position_dodge(0.5), width = 0.3) +
    geom_bar(aes(fill = host), width = 0.5, alpha = 0.8,
             stat = "summary", fun = "mean", position = position_dodge(0.5)) +
    geom_point(data = function(x) subset(x, !symbol %in% c("CCCCC", "SSSSS")),
               aes(group = host, color = date), size = 1, shape = 3, alpha = 0.9,
               position = position_jitterdodge(dodge.width = 0.5, jitter.width = 0.1)) +
    scale_color_manual(values = date.colors, guide = "none") +
    #geom_point(data = function(x) subset(x, !symbol %in% c("CCCCC", "SSSSS")),
    #           aes(group = host, color = host), size = 1, shape = 3, alpha = 0.9,
    #           position = position_jitterdodge(dodge.width = 0.5, jitter.width = 0.1)) +
    #scale_color_manual(values = point.colors) +
    scale_fill_manual(values = host.colors) +
    facet_grid(parameter~group, scales = "free", space = "free_x") +
    theme_bw(base_size = 18) + background_grid(minor = "none") + 
    xlab("Pho4 chimera") +
    theme(axis.text.x = element_text(angle = 30, hjust = 1, family = "mono"),
          legend.position = "top",
          axis.title = element_blank())
  return(p) 
}
```

## Pho4 chimera protein level variation

What is the distribution of Pho4 chimera protein levels? Is the mCherry/mNeon ratio a faithful measure of the chimera's activities?

Distribution of Pho4-mNeon levels grouped by plasmid and host.

```{r}
host.labels = c("PHO2", "pho2∆")
point.colors = c("PHO2" = "forestgreen", "pho2" = "purple4")
p1 <- dat %>% 
  filter(!is.na(plasmid)) %>% 
  mutate(plasmid = fct_reorder(plasmid, BL1.H, .fun = median) %>% 
           fct_relevel("194", "188")) %>% 
  ggplot(aes(x = plasmid, y = BL1.H)) +
  geom_point(aes(color = host), position = position_jitter(0.1),
             size = 1.1) + 
  scale_color_manual("Host", values = point.colors, labels = host.labels) +
  scale_y_log10(breaks = c(100, 1000, 10000), expand = expansion(mult = 0.1)) +
  scale_x_discrete(expand = expansion(mult = 0.03)) +
  xlab("Pho4 constructs") + ylab("Pho4-mNeon (a.u.)") +
  theme_cowplot() + panel_border(color = "gray30", size = 1.2) +
  theme(axis.text.x = element_text(angle = 90, size = rel(0.6), vjust = 0.5),
        axis.text.y = element_text(size = rel(0.8)),
        axis.title = element_text(size = rel(0.9)),
        axis.line = element_blank(),
        legend.position = c(0.05, 0.9),
        legend.direction = "horizontal",
        legend.text = element_text(face = 3))
p1
#ggsave("../img/20240307-Pho4-chimera-protein-level-variation.png", 
#       width = 6, height = 3)
```

All chimera (ScPho4 and CgPho4 removed) protein levels by host

```{r}
tmp <- dat %>% 
  filter(plasmid == "194", host == "PHO2")

lm <- lm(YL2.H ~ BL1.H, data = tmp)
summary(lm)

p2 <- tmp %>% 
  ggplot(aes(x = BL1.H, y = YL2.H)) +
  geom_point(size = 1.5) + 
  stat_smooth(method = "lm", formula = y ~ x) +
  xlab("Pho4-mNeon") + ylab("PHO5pr-mCherry") +
  theme_cowplot() + 
  panel_border(color = "gray30", size = 1.2) +
  theme(axis.line = element_blank(),
        axis.title = element_text(size = rel(1.2))
  )

p2
ggsave("../img/20240307-ScPho4-mCherry-vs-mNeon-consistent.png", 
       width = 4, height = 3.5)
```

```{r}
host.colors =  c("PHO2" = "gray60", "pho2" = "orange")

tmp <- dat %>% 
  filter(plasmid %in% c("188", "194"), 
         date %in% c("02/08", "02/11", "02/18", "02/21", "02/23", "03/31")) %>% 
  mutate(A = YL2.H/BL1.H,
         Pho4 = factor(plasmid, levels = c("194", "188"), 
                       labels = c("ScPho4", "CgPho4"))) 
p3 <- tmp %>% 
  ggplot(aes(x = date, y = A)) + 
  geom_bar(aes(fill = host), stat = "summary", fun = "mean", 
           position = position_dodge(0.9), alpha = 0.9) +
  geom_point(aes(group = host), size = 0.6, shape = 3,
             position = position_dodge(width = 0.9)) + 
  scale_fill_manual("Host", values = host.colors, labels = host.labels) +
  scale_x_discrete(labels = 1:6) +
  #stat_summary(fun.data = "mean_se", geom = "pointrange", color = "red") +
  facet_grid(Pho4 ~ .) +
  ylab("mCherry/mNeon") + xlab("Replicate") +
  theme_cowplot() + panel_border(color = "gray30", size = 1.2)+
  theme(axis.text = element_text(size = rel(0.7)),
        axis.title = element_text(size = rel(1)),
        axis.line = element_blank(),
        strip.background = element_blank(),
        legend.position = "top",
        legend.title = element_text(size = rel(0.9)),
        legend.text = element_text(size = rel(0.8), face = 3))
p3
#ggsave("../img/20240307-CgPho4-mCherry-vs-mNeon-consistent.png", 
#       width = 4, height = 3.2)

# sample size per day of experiment
tmp %>% count(date, Pho4, host) %>% 
  pivot_wider(names_from = host, values_from = n)
```

## High variance samples

Summarize the background subtracted data by calculating the means and cv for each strain.

```{r}
cv <- dat %>% 
  select(-nGFP, -nRFP) %>%
  pivot_longer(FSC.H:YL2.H, names_to = "parameter", values_to = "intensity") %>% 
  group_by(date, plasmid, host, parameter) %>% 
  summarize(
    n = n(),
    mean = mean(intensity),
    cv = sd(intensity)/mean(intensity),
    .groups = "drop"
  ) %>% 
  arrange(desc(cv))
```

Use the control strain (pH194 with PHO2) to identify and correct for systematic biases

```{r}
control <- filter(dat, plasmid == "194", host == "PHO2") %>% 
  separate(well, into = c("row", "col"), sep = 1) %>% 
  droplevels()
```

Model for mNeon

```{r}
gfp.model.0 <- lm(BL1.H ~ log10(events) + date + row*col, data = control)
step(gfp.model.0)
gfp.model.1 <- lm(BL1.H ~ date + col, data = control)
```

Model for PHO5pr::RFP

```{r}
rfp.model.0 <- lm(YL2.H ~ log10(events) + date + row*col, data = control)
step(rfp.model.0)
rfp.model.1 <- lm(YL2.H ~ log10(events) + date + row + col, data = control)
```

> there are more systematic shifts in the RFP, significant for row, col, date and also \# of events however, I won't be removing these effects yet, because I've found that RFP/GFP ratios are pretty consistent across days. In other words, the variation in GFP and RFP may be cancelled out.

Check for each plasmid how consistent are the measurements between days

```{r}
tmp <- dat %>% 
  # remove one sample with only one valid day of experiment
  filter(!(plasmid == "218" & host == "PHO2"), !plasmid %in% c("188", "194", NA)) %>% 
  nest(data = c(date, BL1.H, YL2.H), .by = c(plasmid, host))

day.var.gfp <- tmp %>% 
  mutate(model = map(data, function(df) lm(BL1.H ~ date, data = df)),
         tidied = map(model, broom::tidy)) %>% 
  unnest(tidied) %>% 
  filter(term != "(Intercept)") %>% 
  mutate(p.adj = p.adjust(p.value, method = "BH")) %>% 
  select(-data, -model) %>% 
  filter(p.adj < 0.10) %>% 
  arrange(plasmid, host)

day.var.rfp <- tmp %>% 
  mutate(model = map(data, function(df) lm(YL2.H ~ date, data = df)),
         tidied = map(model, broom::tidy)) %>% 
  unnest(tidied) %>% 
  filter(term != "(Intercept)") %>% 
  mutate(p.adj = p.adjust(p.value, method = "BH")) %>% 
  select(-data, -model) %>% 
  filter(p.adj < 0.10) %>% 
  arrange(plasmid, host)
```

```{r}
# extract ximera names
refs <- c("188","194")
# make a test set
day.var.gfp.list <- unique(day.var.gfp$plasmid)
day.var.rfp.list <- unique(day.var.rfp$plasmid)
```

High day-to-day GFP variance: `r day.var.gfp.list` High day-to-day RFP variance: `r day.var.rfp.list`

Plotting components for chimeras with high day-to-day variance in Pho4-mNeon

```{r}
p <- my_plot_ratio(c(refs,day.var.gfp.list))# + 
p
```

> Watch out for CSCscC, SCCsS, SCCsS

Plotting components for chimeras with high day-to-day variance in *PHO5pr*-mCherry

```{r}
p <- my_plot_ratio(c(refs,day.var.rfp.list))
p
```

> most of the day-to-day variance are canceled out after RFP/GFP normalization

## All chimera, scatter plot

Plot all chimeras, coloring based on P2ID source
```{r}
my_scatter_plot_fix <- function(){
  # this function is the same as the one in the script file, but is used to 
  # plot region 4 effects alone, and doesn't take any input
  s1 = my_data_select(pattern = "XXXCCX")
  s2 = my_data_select(pattern = "XXXSSX")
  scatter.colors = c("ScPho4" = "forestgreen", "CgPho4" = "blue3", 
                     "P2ID:Cg" = "deepskyblue", "P2ID:Sc" = "palegreen2",
                     "P2ID:mixed" = "gray20")
  scatter.size = c("ScPho4" = 3.5, "CgPho4" = 3.5,
                     "P2ID:Cg" = 2.5, "P2ID:Sc" = 2.5, "P2ID:mixed" = 2.5)
  p <- ximera %>% 
    # exclude the alternative break point sets "A" and "B"
    # in particular, pH294 is an alternative break point CCCcs, where P2ID:Cg
    # extends to aa 270 instead of 458. it has nearly the same activities as
    # CgPho4. However, the CCCCS in the main set has significantly reduced
    # activities both with and without Pho4. We later tested whether the 
    # additional P2ID:Cg4 rescues the effect (see Cg4ext below) and it didn't
    # so far, this seems to be an one-off. we need to further investigate its
    # activities.
    filter(set %in% c("M", "S")) %>% 
    mutate(A_PHO2 = signif(A_PHO2, digits = 2),
           A_pho2 = signif(A_pho2, digits = 2),
           group = case_when(
             symbol == "CCCCC" ~ "CgPho4",
             symbol == "SSSSS" ~ "ScPho4",
             plasmid %in% s1 ~ "P2ID:Cg",
             plasmid %in% s2 ~ "P2ID:Sc",
             .default = "P2ID:mixed"
           ),
           group = fct_relevel(group, names(scatter.colors))) %>% 
    ggplot(aes(x = A_PHO2, y = A_pho2, label = symbol)) + 
    geom_abline(slope = 1) +
    geom_point(aes(color = group, size = group)) + 
    scale_color_manual(NULL, values = scatter.colors) +
    scale_size_manual(values = scatter.size, guide = "none") +
    labs(x = bquote(A[PHO2]), y = bquote(A[pho2*Delta])) +
    theme_cowplot() + panel_border(color = "gray30", size = 1.2) +
    theme(legend.text = element_text(size = rel(0.8)),
          legend.position = c(0.03, 0.83),
          axis.title = element_text(face = 2, size = rel(1.2)),
          axis.line = element_blank())
  return(p)
}
```

```{r}
p <- my_scatter_plot_fix()
ggsave(filename = "../img/20240308-all-chimera-scatter-color-by-P2ID.png",
       plot = p, width = 4.5, height = 4, dpi = 300)
ggplotly(p + labs(x = "A<sub>PHO2</sub>", y = "A<sub>pho2</sub>") +
           theme_gray(base_size = 16) +
           theme(legend.text = element_markdown()), 
         tooltip = c("label", "x", "y"))
```

this function is the same as my_scatter_plot_fix except that it plots all the chimeras without coloring them differently. for figure 5
```{r}
my_scatter_plot_all <- function(){
  # this function is the same as my_scatter_plot_fix except that it plots all the chimeras
  # without coloring them differently. for figure 5
  s1 = my_data_select(pattern = "XXXCCX")
  s2 = my_data_select(pattern = "XXXSSX")
  scatter.colors = c("ScPho4" = "forestgreen", "CgPho4" = "blue3", 
                     "P2ID:Cg" = "gray20", "P2ID:Sc" = "gray20",
                     "P2ID:mixed" = "gray20")
  scatter.size = c("ScPho4" = 3.5, "CgPho4" = 3.5,
                   "P2ID:Cg" = 2.5, "P2ID:Sc" = 2.5, "P2ID:mixed" = 2.5)
  p <- ximera %>% 
    filter(set %in% c("M", "S")) %>% 
    mutate(A_PHO2 = signif(A_PHO2, digits = 2),
           A_pho2 = signif(A_pho2, digits = 2),
           group = case_when(
             symbol == "CCCCC" ~ "CgPho4",
             symbol == "SSSSS" ~ "ScPho4",
             plasmid %in% s1 ~ "P2ID:Cg",
             plasmid %in% s2 ~ "P2ID:Sc",
             .default = "P2ID:mixed"
           ),
           group = fct_relevel(group, names(scatter.colors))) %>% 
    ggplot(aes(x = A_PHO2, y = A_pho2, label = symbol)) + 
    geom_abline(slope = 1) +
    geom_point(aes(color = group, size = group)) + 
    scale_color_manual(NULL, values = scatter.colors) +
    scale_size_manual(values = scatter.size, guide = "none") +
    labs(x = bquote(A[PHO2]), y = bquote(A[pho2])) +
    theme_cowplot() + panel_border(color = "gray30", size = 1.2) +
    theme(legend.text = element_text(size = rel(0.8)),
          legend.position = "none",
          axis.title = element_text(face = 2, size = rel(1.2)),
          axis.line = element_blank())

  return(p)
}
```

Plot all chimeras, for Fig. 5
```{r}
p <- my_scatter_plot_all()
ggsave(filename = "../img/20241121-all-chimera-scatter.png",
       plot = p, width = 4.5, height = 4, dpi = 300)
ggplotly(p + labs(x = "A<sub>PHO2</sub>", y = "A<sub>pho2</sub>") +
           theme_gray(base_size = 16) +
           theme(legend.text = element_markdown()), 
         tooltip = c("label", "x", "y"))
```
## Spotlight individual chimeras

The goal here is to plot individual chimeras in order to test specific hypotheses and make certain points.

1.  We separately tested and found that CgPho4 DBD binds the consensus DNA more strongly than ScPho4 does, and it also has two additional activation booster regions, which enhance the activity of the main AD. We therefore hypothesize that by replacing the corresponding regions in ScPho4 with the parts from CgPho4, we would create a chimeric TF that is not or far less dependent on Pho2.
2.  We also expect that those regions additively contribute to the reduced Pho2-dependence, shown as increased TF activity of the chimera in the *pho2∆* background.

Design plot

```{r}
my_plot_subset_ximera <- function(symbols){
  # this function plots a subset of the chimeras as horizontal bar plots
  # showing the Rel. A_PHO2 and %A_pho2∆ values
  # it takes as input a vector containing the symbols for the chimeras for 
  # plotting. the order in the vector determines the plot order
  # the endogenous ScPho4 and CgPho4 are implied
  missing <- setdiff(symbols, ximera$symbol)
  if(length(missing) != 0)
    stop(paste(missing, "are not found", sep = " "))
  
  tmp <- filter(ximera, symbol %in% c("SSSSS", "CCCCC", symbols)) %>% 
    mutate(
      rSE_PHO2 = se_PHO2 / A_PHO2[symbol == "SSSSS"],
      rSE_pho2 = se_pho2 / A_pho2[symbol == "SSSSS"]
    ) %>% 
    pivot_longer(cols = c(rA_PHO2, rA_pho2, rSE_PHO2, rSE_pho2), 
                 #pivot_longer(cols = c(A_PHO2, A_pho2, se_PHO2, se_pho2), 
                 names_to = c(".value", "parameter"), names_sep = "_",
                 values_to = "value") %>% 
    mutate(parameter = fct_relevel(parameter, "PHO2"),
           symbol = factor(symbol, levels = 
                             unique(c("SSSSS", "CCCCC", symbols)))) %>% 
    select(-c(A_PHO2:boost))
  
  # labeller
  par.explain <- c(
    PHO2 = "Rel. A<sub>PHO2</sub>",
    #boost = "Boost",
    pho2 = "Rel. A<sub>pho2∆</sub>"
  )
  
  p <- ggplot(tmp, aes(y = symbol, x = rA)) +
    geom_col(width = 0.5, color = "black", fill = "gray80") +
    geom_vline(xintercept = 1, linetype = 2, color = "gray30") +
    geom_errorbar(aes(xmin = rA - rSE, xmax = rA + rSE), width = 0.2) +
    facet_wrap(~parameter, scales = "free_x",# switch = "x",
              labeller = labeller(parameter = par.explain)) +
    scale_y_discrete(limits = rev) + 
    scale_x_continuous(expand = expansion(mult = c(0.02, 0.05))) +
    theme_cowplot() + panel_border(color = "gray30") +
    background_grid(major = "y", minor = "none") +
    theme(axis.text.y = element_text(family = "courier"),
          axis.title = element_blank(),
          axis.line = element_blank(),
          strip.placement = "outside",
          strip.background = element_blank(),
          strip.text = element_markdown())
  return(p)
}
```

<font color="red">Update 2024-11-22</font>

Alternative design, with individual points and not relative to ScPho4. Also, individual datapoints were plotted for samples with \<10 replicates

```{r}
my_plot_subset_ximera_alt <- function(symbols){
  # this function plots a subset of the chimeras as horizontal bar plots
  # showing the Rel. A_PHO2 and %A_pho2∆ values
  # it takes as input a vector containing the symbols for the chimeras for 
  # plotting. the order in the vector determines the plot order
  # the endogenous ScPho4 and CgPho4 are implied
  missing <- setdiff(symbols, ximera$symbol)
  if(length(missing) != 0)
    stop(paste(missing, "are not found", sep = " "))
  
  tmp <- filter(dat_sep, symbol %in% c("SSSSS", "CCCCC", symbols)) %>% 
    mutate(host = fct_relevel(host, "PHO2"),
           symbol = factor(symbol, levels = 
                             unique(c("SSSSS", "CCCCC", symbols))))

  tmp %>% count(symbol, host) %>% print()
  # labeller
  par.explain <- c(
    PHO2 = "A<sub>PHO2</sub>",
    #boost = "Boost",
    pho2 = "A<sub>pho2∆</sub>"
  )
  
  p <- ggplot(tmp, aes(y = symbol, x = A)) +
    geom_bar(stat = "summary", fun = "mean", 
             width = 0.5, color = "black", fill = "gray80") +
    stat_summary(fun.data = "mean_cl_boot", geom = "linerange",
                 color = "steelblue4") +
    geom_point(data = filter(tmp, !symbol %in% c("CCCCC", "SSSSS")), 
               size = 0.6, shape = 3, color = "gray30") +
    #geom_vline(xintercept = 1, linetype = 2, color = "gray30") +
    #geom_errorbar(aes(xmin = rA - rSE, xmax = rA + rSE), width = 0.2) +
    facet_wrap(~host, scales = "free_x",# switch = "x",
              labeller = labeller(host = par.explain)) +
    scale_y_discrete(limits = rev) + 
    scale_x_continuous(expand = expansion(mult = c(0.02, 0.05))) +
    theme_cowplot() + panel_border(color = "gray30") +
    background_grid(major = "y", minor = "none") +
    theme(axis.text.y = element_text(family = "courier"),
          axis.title = element_blank(),
          axis.line = element_blank(),
          strip.placement = "outside",
          strip.background = element_blank(),
          strip.text = element_markdown())
  return(list(data = tmp, plot = p))
}
```

### Minimal CgPho4 parts for A_pho2

The chimera with the least amount of CgPho4 and yet have appreciable activity in the absence of Pho2 is These include SSSSS, CCCCC, SSSSC, CSSSS, SSCSS, CSCSS, CSSSC, CSCSC

```{r}
selected <- as.character(
  expression(CSSSS, SCSSS, CCSSS, SSCSS, CSCSS, SCCSS, CCCSS, SSSSC, SSCSC, CSCSC, CSScC))
#selected <- filter(meta, symbol %in% selected) %>% pull(plasmid)
my_plot_subset_ximera(selected)
ggsave("../img/20240308-selected-chimera-rel-activity.png", width = 4, height = 4)
# plot with absolute A not relative, and plot individual data points
plot.sub1 <- my_plot_subset_ximera_alt(selected)
print(plot.sub1$plot)
ggsave("../img/20241122-selected-chimera-rel-activity.png", width = 4, height = 4)
# save the data for publication
plot.sub1$data %>% 
  select(plasmid_id = plasmid, chimera_makeup = symbol, host, A) %>% 
  write_tsv("../output/20250213-Fig-5C-data.tsv")
```

### Region 1-3 main effects and interactions

```{r}
# select the chimeras
selected <- as.character(
  expression(SSSSS, CSSSS, SCSSS, SSCSS, CCSSS, CSCSS, SCCSS, CCCSS)
)

# extract the data
tmp <- ximera %>% 
  filter(symbol %in% selected, set == "M") %>% 
  select(plasmid, symbol, group) %>% 
  inner_join(dat, by = "plasmid") %>% 
  mutate( `R/G` = YL2.H / BL1.H ) %>%
  filter(flag == "pass") %>% 
  select(-nRFP, -nGFP, -well, -flag)

# prepare the factor levels
split <- c(1,1,1,2); names(split) <- c("R1", "AD", "NLS")

tmp <- tmp %>% 
  separate_wider_position(symbol, split) %>% 
  mutate(across(R1:NLS, ~factor(.x, levels = c("S", "C"))))

# test A_PHO2
print("Testing A_PHO2")
lm.res <- tmp %>% 
  filter(host == "PHO2") %>% 
  lm(`R/G` ~ (R1*AD*NLS), data = .) %>% 
  summary()
# adding adjusted P-value
lm.res$coefficients <- cbind(
  coef(lm.res),
  "P.adj" = p.adjust(coef(lm.res)[,'Pr(>|t|)'], method = "holm")
)
print(lm.res)
# store the test results for plotting
res.PHO2 <- coef(lm.res)[-1,] %>% as_tibble(rownames = "component")

# test A_pho2
print("Testing A_pho2∆")
lm.res <- tmp %>% 
  filter(host == "pho2") %>% 
  lm(`R/G` ~ (R1*AD*NLS), data = .) %>% 
  summary()
# adding adjusted P-value
lm.res$coefficients <- cbind(
  coef(lm.res),
  "P.adj" = p.adjust(coef(lm.res)[,'Pr(>|t|)'], method = "holm")
)
print(lm.res)
# store the test results for plotting
res.pho2 <- coef(lm.res)[-1,] %>% as_tibble(rownames = "component")

# combine the results
test.res <- bind_rows(
  "A_PHO2" = res.PHO2, "A_pho2" = res.pho2, .id = "parameter"
)

# save the output in a text file for paper
write_tsv(test.res, file = "../output/20250213-region-1-3-linear-model-test.txt")
```

Plot the result

```{r}
par.explain <- c(
  A_PHO2 = "A<sub>PHO2</sub>",
  A_pho2 = "A<sub>pho2∆</sub>"
)

p <- test.res %>% 
  rename(estimate = Estimate, se = `Std. Error`) %>% 
  mutate(
    component = gsub("C", "", component) %>% fct_inorder(),
    parameter = factor(parameter, levels = c("A_PHO2", "A_pho2")),
    sig = P.adj < 0.05
  ) %>% 
  ggplot(aes(x = component, y = estimate)) +
  geom_hline(yintercept = 0, linetype = 1, color = "gray50") +
  geom_col(aes(fill = P.adj < 0.05), width = 0.5, color = "black") +
  geom_pointrange(aes(ymin = estimate-se, ymax = estimate+se), size = 0.2) +
  facet_wrap(~parameter, scales = "free_y", nrow = 2,
             labeller = labeller(parameter = par.explain)) +
  scale_x_discrete() + 
  scale_y_continuous() +
  scale_fill_manual(NULL, 
                    values = c("gray90", "gray50")) +
  theme_cowplot() + panel_border(color = "gray30") +
  background_grid(major = "y", minor = "y") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, size = rel(1)),
        axis.title = element_blank(),
        axis.line = element_blank(),
        legend.position = "bottom",
        strip.placement = "outside",
        strip.background = element_blank(),
        strip.text = element_markdown(size = rel(1)))
p
ggsave("../img/20240404-region1-3-epistasis-plot.png", width = 3.5, height = 4.5)
```

### Region 4 splits

We have so far focused on the main set with the 5 region design. In the scatter plot below, we see that there is a subset of chimeras in between the P2ID:Sc and P2ID:Cg ones. They are interesting in that their A_pho2∆/A_PHO2 ratios are intermediate.

![scatter](../img/20231220-all-chimera-scatter-color-by-P2ID.png)

```{r}
selected <- as.character(
  expression(CCCSC, CCCcsC, CCCscC, CSSCC, CSSSC, CSScsC, CSSscC))
#selected <- filter(meta, symbol %in% selected) %>% pull(plasmid)
plot.sub2 <- my_plot_subset_ximera_alt(selected)
plot.sub2$plot + scale_x_continuous(expand = expansion(mult = c(0.02, 0.15)))
ggsave("../img/20241122-P2ID-split-rel-activity.png", width = 3.5, height = 3.5)
plot.sub2$data %>% 
  select(plasmid_id = plasmid, chimera_makeup = symbol, host, A) %>% 
  write_tsv("../output/20250213-Fig-6B-data.tsv")
```

Statistical tests for group 1

```{r}
# select the chimeras
selected <- as.character(
  expression(CCCCC, CCCSC, CCCcsC, CCCscC)
)

# extract the data
tmp <- ximera %>% 
  filter(symbol %in% selected) %>% 
  select(plasmid, symbol, group) %>% 
  inner_join(dat, by = "plasmid") %>% 
  mutate( `R/G` = YL2.H / BL1.H ) %>%
  filter(flag == "pass") %>% 
  select(-nRFP, -nGFP, -well, -flag) %>% 
  mutate(symbol = factor(symbol, levels = !!selected))

# test A_PHO2
print("Testing A_PHO2")
lm.res <- tmp %>% 
  filter(host == "PHO2") %>% 
  lm(`R/G` ~ symbol, data = .) %>% 
  summary()
# adding adjusted P-value
lm.res$coefficients <- cbind(
  coef(lm.res),
  "P.adj" = p.adjust(coef(lm.res)[,'Pr(>|t|)'], method = "holm")
)
print(lm.res)
# store the test results for plotting
#res.PHO2 <- coef(lm.res)[-1,] %>% as_tibble(rownames = "component")

# test A_pho2
print("Testing A_pho2∆")
lm.res <- tmp %>% 
  filter(host == "pho2") %>% 
  lm(`R/G` ~ symbol, data = .) %>% 
  summary()
# adding adjusted P-value
lm.res$coefficients <- cbind(
  coef(lm.res),
  "P.adj" = p.adjust(coef(lm.res)[,'Pr(>|t|)'], method = "holm")
)
print(lm.res)
```

Statistical tests for group 2

```{r}
# select the chimeras
selected <- as.character(
  expression( CSSCC, CSSSC, CSScsC, CSSscC )
)

# extract the data
tmp <- ximera %>% 
  filter(symbol %in% selected) %>% 
  select(plasmid, symbol, group) %>% 
  inner_join(dat, by = "plasmid") %>% 
  mutate( `R/G` = YL2.H / BL1.H ) %>%
  filter(flag == "pass") %>% 
  select(-nRFP, -nGFP, -well, -flag) %>% 
  mutate(symbol = factor(symbol, levels = !!selected))

# test A_PHO2
print("Testing A_PHO2")
lm.res <- tmp %>% 
  filter(host == "PHO2") %>% 
  lm(`R/G` ~ symbol, data = .) %>% 
  summary()
# adding adjusted P-value
lm.res$coefficients <- cbind(
  coef(lm.res),
  "P.adj" = p.adjust(coef(lm.res)[,'Pr(>|t|)'], method = "holm")
)
print(lm.res)
# store the test results for plotting
#res.PHO2 <- coef(lm.res)[-1,] %>% as_tibble(rownames = "component")

# test A_pho2
print("Testing A_pho2∆")
lm.res <- tmp %>% 
  filter(host == "pho2") %>% 
  lm(`R/G` ~ symbol, data = .) %>% 
  summary()
# adding adjusted P-value
lm.res$coefficients <- cbind(
  coef(lm.res),
  "P.adj" = p.adjust(coef(lm.res)[,'Pr(>|t|)'], method = "holm")
)
print(lm.res)
```

## Region main effect

```{r}
split <- c(1,1,1,1,1); names(split) <- paste0("P", 1:5)
tmp <- ximera %>% 
  filter(set == "M", group != "n.f.") %>% 
  separate_wider_position(symbol, split) %>% 
  mutate(across(P1:P5, ~factor(.x, levels = c("S", "C"))))
lm <- lm(A_pho2 ~ (P1+P2+P3+P4+P5), data = tmp)
summary(lm)
```

The main effects were calculated by averaging over all chimeras with CgPho4 region at the respective position. I'd like to break them down by backgrounds. For example, for region 3, I'd like to see the pairwise comparisons between CCCSS and CCSSS, where only region 3 differs. The steps are

1.  select the region to be compared. split the symbol into two parts - the genotype of the focal region and the rest
2.  group by the second part (rest) and calculate the differential

```{r}
my_calc_region_effect <- function(region, variable){
  # this function takes the name of a variable of interest
  # x specifies the foreground region, which will be examined for its effect on
  # the variable of interest.
  # it then transforms the ximera data frame to preserve only the variable of
  # interest, pivots it wider after grouping by the background composition.
  
  # prepare the data by mutating the symbol column into fg and bg
  valid.var <- c("A_PHO2", "A_pho2", "rA_PHO2", "rA_pho2", "boost")
  if(!variable %in% valid.var)
    stop(paste0("Please specify one of the valid variable names:", 
                paste(valid.var, collapse = ", ")))
  tmp <- ximera %>% 
    filter(set == "M") %>% 
    select(plasmid, symbol, var = {{ variable }}) %>% 
    mutate(fg = str_sub(symbol, region, region) %>% toupper(),
           bg = symbol %>% toupper())
  # replace the foreground region with X for grouping
  str_sub(tmp$bg, region, region) <- "X"
  # reorganize the tibble for easier handling, optional
  tmp <- relocate(tmp, fg, bg, .before = symbol) %>% select(-symbol)
  # pivot the data into a wide format such that for each background, there
  # are two values for the variable of interest, one from the chimera with 
  # CgPho4's version in the foreground and another with ScPho4's version
  tmp <- tmp %>% 
    select(plasmid, fg, bg, var) %>% 
    pivot_wider(id_cols = bg, names_from = "fg", 
                values_from = c(plasmid, var)) %>% 
    unite(plasmid, starts_with("plasmid")) %>%
    mutate(label = paste(bg, plasmid, sep = "\n"))
  return(tmp)
}
```

```         
x = 5
p1 <- my_plot_region_effect_onevar(x, "A_PHO2")
p2 <- my_plot_region_effect_onevar(x, "A_pho2")
subplot(p1, p2, margin = 0.05) %>% 
  layout(title = paste("Region", x, "swap effect on A_PHO2 and A_pho2", sep = " "),
         xaxis = list(title = paste0("Region ", x, " from CgPho4")),
         yaxis = list(title = paste0("Region ", x, " from ScPho4")) )
```

Here, I'd like to take what I build above and create a new tibble, in which each row is a different background (makeup of the chimera except for the focal region). The value columns are:

1.  dA_PHO2 = A_PHO2_Cg - A_PHO2_Sc
2.  dA_pho2 = A_pho2_Cg - A_pho2_Sc
3.  A_PHO2_Sc = A_PHO2_Sc

The goal is to plot dA_PHO2 and dA_pho2 side-by-side for each background.

```{r}
my_comp_region_effect <- function(region){
  # this function uses my_calc_region_effect to get the value for the variable of interest
  # with either Cg or Sc version in the focal region, separately for each background composition
  # it does so for two variables, A_PHO2 and A_pho2, then calculate dA_PHO2, dA_pho2, and
  # combine them
  PHO2 = my_calc_region_effect(region, "A_PHO2") %>% 
    mutate(dA_PHO2 = var_C - var_S,
           # mean A_PHO2
           M_PHO2 = (var_S + var_C)/2,
           NF = ifelse(M_PHO2 <=3.5, TRUE, FALSE)) %>% 
    select(-var_S, -var_C)
  
  pho2 = my_calc_region_effect(region, "A_pho2") %>% 
    mutate(dA_pho2 = var_C - var_S, 
           M_pho2 = (var_S + var_C)/2) %>% 
    select(-var_S, -var_C)
  
  dat <- full_join(PHO2, pho2, by = c("bg", "plasmid", "label")) %>% 
    select(bg, plasmid, dA_PHO2, dA_pho2, M_PHO2, M_pho2, NF)
  
  return(dat)
}


```

```{r}
my_plot_region_effect_twovar_line("1", "4")# %>% ggplotly()
ggsave("../img/20240310-region-swap-effect-1-on-4.png", width = 6, height = 4)
my_plot_region_effect_twovar_line("3", "4")# %>% ggplotly()
ggsave("../img/20240310-region-swap-effect-3-on-4.png", width = 6, height = 4)

```

```         
my_plot_region_effect_twovar_line("4", "5")# %>% ggplotly()
ggsave("../img/20231221-region-swap-effect-4-on-5.png", width = 6, height = 4, dpi = 150)
my_plot_region_effect_twovar_line("5", "4")# %>% ggplotly()
ggsave("../img/20231224-region-swap-effect-5-on-4.png", width = 6, height = 4, dpi = 200)
```

The main plotting functions are now in a separate script file in `../script`. The plotting function below is to adapt the plot for a figure in the paper, simultaneously showing regions 1-3.

```{r}
my_plot_region_effect_twovar_line_par <- function(regions){
  # this function uses my_comp_region_effect to generate the data
  # and plot the difference in A_PHO2 and A_pho2 between the CgPho4 vs ScPho4
  # in the focal region
  dat <- map_dfr(regions, \(region) my_comp_region_effect(region), .id = "region") %>% 
    pivot_longer(cols = c(dA_PHO2, dA_pho2), 
                 names_to = "host", values_to = "diff") %>% 
    mutate(host = fct_recode(host, `PHO2` = "dA_PHO2", `pho2∆` = "dA_pho2"),
           host = fct_relevel(host, "PHO2"))
  # specify grouping variable
  dat <- mutate(dat, 
                grp = str_sub(bg, 4, 4) %>% toupper(),
                grp = fct_recode(grp, CgPho4 = "C", ScPho4 = "S"))#,
                #sh = str_sub(bg, 5, 5) %>% toupper(),
                #sh = fct_recode(sh, CgPho4 = "C", ScPho4 = "S") )
  # specify arrow annotation
  arrow.x = 0.7
  arrow.y = (max(dat$diff) - min(dat$diff)) / 5 
  # plot
  p <- dat %>% 
    ggplot(aes(x = host, y = diff, label = bg)) +
    geom_point(aes(color = grp), size = 2, alpha = 0.8,
               position = position_jitter(0.1)) + 
    geom_line(aes(group = bg), linewidth = 0.2, alpha = 0.8) +
    facet_grid(region ~ grp, labeller = labeller(
      grp = c(CgPho4 = "P2ID:Cg", ScPho4 = "P2ID:Sc"),
      region = label_both
    )) +
    scale_color_manual("P2ID:", values = c("orange", "gray30"), guide = "none") +
    #scale_shape_manual("DBD:", values = c(19, 1)) +
    ylab("Region swap effect (Cg-Sc)") +
    theme_bw(base_size = 18) + 
    theme(
      axis.title.x = element_blank(),
      axis.title.y = element_text(size = rel(0.9)),
      axis.text.x = element_text(face = 3),
      axis.text.y = element_text(size = rel(0.8)),
      legend.text = element_text(size = rel(0.8)),
      legend.title = element_text(size = rel(0.9)),
      legend.position = "top",
      strip.background = element_blank()
    )
  return(p)
}
```

```{r}
my_plot_region_effect_twovar_line_par(c(1,3))
ggsave("../img/20240310-region-swap-effect-1n3-on-4.png",
       width = 5, height = 3.5)
```

```{r}
my_plot_region_effect_twovar_side <- function(region){
  # this function uses my_comp_region_effect to generate the data
  # and plot the difference in A_PHO2 and A_pho2 between the CgPho4 vs ScPho4
  # in the focal region
  dat <- my_comp_region_effect(region) %>% 
    pivot_longer(cols = c(dA_PHO2, dA_pho2), 
                 names_to = "host", values_to = "diff") %>% 
    mutate(host = fct_recode(host, `PHO2` = "dA_PHO2", `pho2` = "dA_pho2"),
           host = fct_relevel(host, "PHO2"))
  p <- dat %>% 
    ggplot(aes(x = bg, y = diff, group = host)) +
    geom_col(aes(fill = host), position = position_dodge(0.9)) +
    scale_fill_manual(values = host.colors) +
    ylab("Region swap diff (Cg vs Sc)") +
    theme_cowplot(font_size = 20) + 
    panel_border(color = "gray30") +
    background_grid(major = "y", minor = "none") +
    theme(axis.text.x = element_text(angle = 45, hjust = 1, family = "courier"),
          axis.title.x = element_blank(),
          legend.position = "top")
  return(p)
}
```

## P2ID:Cg_DBD:Sc fail

Highlight the subset of the chimeras with P2ID:Cg + DBD:Sc, most of which are non functional

```{r}
my_scatter_plot("XXXXCS") +
    labs(x = bquote(A[PHO2]), y = bquote(A[pho2]))
ggsave(filename = "../img/20240214-Pho4-chimeras-scatter-P2ID_Cg-DBD_Sc.png",
       plot = p, width = 6, height = 4, dpi = 300)
```

```{r}
x <- my_data_select(pattern = "XXXXCS", Set = "M")
my_data_prep(x) %>% 
  mutate(group = fct_recode(group, "chimera" = "n.f.")) %>% 
  my_plot_components()
ggsave("../img/20240213-P2ID_Cg-DBD_Sc-components.png", width = 8, height = 5)
```

## Triangle heatmap

First, write a function to generate the data for plotting. If we are going to use ggplot, we need a tibble to store the data, something in the following form

| plasmid | symbol | RegionA | RegionB | A_PHO2 | A_pho2 | rA_PHO2 | boost | perc_pho2 |
|:--------|:-------|:--------|:--------|:-------|:-------|:--------|:------|:----------|
| 209     | CCSCC  | 3       | 3       | 8.25   | 7.82   | 0.468   | 1.06  | 0.94      |

If we are ok with using non ggplot - heatmaps are not ggplot's strength anyways - we can just build a matrix.

Note that this way of summarizing the data has many limitaitons: 1) it requires specifying the reference, either CCCCC or SSSSS. Everything is measured against that; 2) it only shows pairwise (two region) interactions. This turns out to be fine with five regions, since every chimera can be expressed as either a 0, 1 or 2 region swap from one of the two reference genotypes. With 6 or more regions, higher level (3 or more region) interactions cannot be visualized this way. Because of this, we will focus on just the main set for this analysis.

To build the matrix, we need to first identify the chimeras that belong to the set. For that, we will use the "main" set, with the five region split, for the moment at least. The function will first determine which reference to use. If we use SSSSS as the reference, for example, we will assign 0 to the reference. All other chimeras with 1 or 2 regions from Cg will be used to fill an upper triangular matrix, using one of the values of interest, e.g., A_PHO2.

```{r}
my_upper_triangular_mat <- function(alt = "C", var = "A_PHO2", nf.as.na = F){
  # given the alternative allele (C/S) and a variable of interest, e.g., A_PHO2,
  # output an upper triangular matrix containing the values from the variable 
  # of interest, with the row and col numbers based on the first and second
  # positions containing the alternative allele. If all positions contain the 
  # reference allele, the value is subtracted from all values in the matrix
  # when just one position is the alternative allele, the value in the diagonal
  # is set. when there are more than 2 regions containing the alternative allele
  # skip.
  # if "nf.as.na = TRUE", evaluate if the activity of either of the two chimeras
  # being compared is non functional. if yes, set the corresponding matrix value
  # to NA
  out_mat <- matrix(NA, nrow = 5, ncol = 5)
  ref_val <- NA
  dat <- filter(ximera, set == "M") %>% 
    mutate(S = as.character(symbol) %>% toupper())
  if(nf.as.na){
    dat <- filter(dat, group != "n.f.")
  }
  for(i in seq(1, nrow(dat))){
    symbol = dat[i, "S"]
    # determine which positions contain the alternative allele
    p = str_locate_all(symbol, alt)[[1]][,"start"]
    l = length(p)   # how many positions contain the alt allele
    v = dat[[var]][i] # retrieve the value of the variable
    if(l == 0)
      ref_val = v
    else if(l == 1)
      out_mat[p, p] = v
    else if(l == 2)
      out_mat[p[1], p[2]] = v
  }
  out_mat = out_mat - ref_val
  return(out_mat)
}
```

```{r}
my_combined_triangular_mat <- function(alt = "C"){
  # given the alternative allele (C/S), output a matrix containing the values
  # for both with and without Pho2, arranged in two complementary triagular
  # matrices, with the row and col numbers based on the first and second
  # positions containing the alternative allele. If all positions contain the 
  # reference allele, the value is subtracted from all values in the matrix
  # when just one position is the alternative allele, the value in the diagonal
  # is set. when there are more than 2 regions containing the alternative allele
  # skip.
  out_mat <- matrix(NA, nrow = 6, ncol = 6)
  upper <- cbind(NA, my_upper_triangular_mat(alt, var = "A_PHO2", )) %>% 
    rbind(., NA)
  lower <- rbind(NA, t(my_upper_triangular_mat(alt, var = "A_pho2"))) %>% 
    cbind(., NA)
  out_mat = ifelse(is.na(upper), lower, upper)
  return(out_mat)
}
```

```{r}
my_plot_triangle_heatmap <- function(alt, var){
  # this function takes the output of the function above and makes a heatmap
  # using pheatmap function, then rotates it using grid graphics
  # thanks to https://bookdown.org/rdpeng/RProgDA/the-grid-package.html#grid-graphics-coordinate-systems
  # adding title based on https://davetang.github.io/muse/pheatmap.html
  
  # construct title of plot
  ref = ifelse(alt == "C", "ScPho4", "CgPho4")
  bg = ifelse(var == "A_PHO2", "with PHO2", "w/o pho2")
  my_title <- paste("Epistasis between regions on", ref, "background", bg)
  test <- my_upper_triangular_mat(alt = alt, var = var)
  paletteLength = 50
  myColors <- colorRampPalette(c("steelblue3", "gray90", "red"))(paletteLength)
  rng <- max(abs(test), na.rm = TRUE)
  myBreaks <- c(seq(-rng, 0, length.out=ceiling(paletteLength/2) + 1), 
                seq(rng/paletteLength, rng,
                    length.out=floor(paletteLength/2)))
  p <- pheatmap::pheatmap(test, color = myColors, breaks = myBreaks,
                          border_color = NA, na_col = NA, silent = TRUE,
                          cluster_cols = FALSE, cluster_rows = FALSE)
  vp <- viewport(x = 0.5, y = 0.25,
                 width = unit(4.5, "in"), height = unit(4.5, "in"), angle = 47) 
  grid.newpage()
  pushViewport(vp)
  grid.draw(p$gtable)
  popViewport()
  grid.text(label = my_title, x = 0.5, y = 0.95, gp = gpar(fontsize = 16, fontface = "bold"))
  return(p)
}
```

```{r}
my_plot_combined_triangle_heatmap <- function(alt){
  # this function takes the output of the function my_combined_triangular_mat()
  # using pheatmap function, then rotates it using grid graphics
  # thanks to https://bookdown.org/rdpeng/RProgDA/the-grid-package.html#grid-graphics-coordinate-systems
  # adding title based on https://davetang.github.io/muse/pheatmap.html
  
  # construct title of plot
  ref = ifelse(alt == "C", "ScPho4", "CgPho4")
  my_title <- paste("Epistasis between regions on", ref, "background")
  test <- my_combined_triangular_mat(alt = alt)
  paletteLength = 50
  myColors <- colorRampPalette(c("steelblue", "gray90", "red"))(paletteLength)
  rng <- max(abs(test), na.rm = TRUE)
  myBreaks <- c(seq(-rng, 0, length.out=ceiling(paletteLength/2) + 1), 
                seq(rng/paletteLength, rng,
                    length.out=floor(paletteLength/2)))
  p <- pheatmap::pheatmap(test, color = myColors, breaks = myBreaks,
                          border_color = NA, na_col = NA, silent = TRUE,
                          cluster_cols = FALSE, cluster_rows = FALSE)
  vp <- viewport(x = 0.5, y = 0.45,
                 width = unit(3, "in"), height = unit(2.8, "in"), angle = 47) 
  grid.newpage()
  pushViewport(vp)
  grid.draw(p$gtable)
  popViewport()
  grid.text(label = my_title, x = 0.5, y = 0.95, 
            gp = gpar(fontsize = 16, fontface = "bold"))
  grid.text(label = "With Pho2", x = 0.1, y = 0.65, just = c("left", "top"),
            gp = gpar(fontsize = 14, fontface = "bold"))
  grid.text(label = "Without pho2", x = 0.1, y = 0.25, just = c("left", "top"), 
            gp = gpar(fontsize = 14, fontface = "bold"))
  return(p)
}
```

```{r}
png("../img/20240115-triangle-heatmap-CgPho4-ref.png", width = 7, height = 5, units = "in", res = 300)
p1 <- my_plot_combined_triangle_heatmap("C")
dev.off()
png("../img/20240115-triangle-heatmap-ScPho4-ref.png", width = 7, height = 5, units = "in", res = 300)
p2 <- my_plot_combined_triangle_heatmap("S")
dev.off()
```
